





| 








BIOMETRIKA 


A. JOURNAL FOR THE STATISTICAL STUDY OF 
BIOLOGICAL PROBLEMS 


FOUNDED BY 
W. F. R. WELDON, FRANCIS GALTON anp KARL PEARSON 


EDITED BY 
EGON S. PEARSON 


IN CONSULTATION WITH 


HARALD CRAMER J. B. 8. HALDANE 
R. C. GEARY G. M. MORANT 
MAJOR GREENWOOD JOHN WISHART 


Reprinted by offeet-litho, 1952, 1960 


ISSUED BY THE BIOMETRIKA OFFICE 
UNIVERSITY COLLEGE, LONDON 


AND PRINTED AT THE 
UNIVERSITY PRESS, CAMBRIDGE 


PRINTED IN GREAT BRITAIN 


[Issued 31 January 1941] 








VoLuME XXXII. Parr I JANUARY, 1941 





A THEORY OF RANDOMNESS 
By M. G. KENDALL 


INTRODUCTION 


1. In two recent papers Babington Smith & I (1938 and 1939) have discussed 
the problems of sampling with random numbers and the construction of tables 
of such numbers by mechanical methods. With the publication of 100,000 
numbers (1940) what one may call the practical side of the investigation has 
come to an end. The purpose of this paper is to develop the theory of the subject 
and to put in their proper setting some of the ideas on which the practical re- 
search was based. It is divided into two parts. In the first I deal with the 
symbols and mathematics of the theory of random suites, my fundamental 
contention being that a theory of randomness can be developed within the 
framework of existing mathematical notions. The second part indicates how the 
theory is to be related to practice. 


2. Much of the following work was suggested by the treatment of von Mises 


(1936) and Dorge (1934), and I take the opportunity of expressing my indebted- 
ness to them. The principal difference between von Mises’s views and my own 
concerns his concept of the Irregular Kollektiv, or infinite random series. 
Numerous attempts have been made to show that this concept leads to a contra- 
diction and that it is therefore an improper foundation for a theory of proba- 
bility. Such attempts have mostly failed, but under pressure of the criticisms 
embodied in them the definition of the Irregular Kollektiv has been successively 
modified by von Mises’s followers until it has lost the pristine simplicity which 
was originally one of its most attractive features. I do not propose to discuss 
here the difficulties associated with the concept of the Irregular Kollektiv or the 
various expedients which have been proposed to meet them. I have tried to cut 
the Gordian knot by rejecting the concept, and the theory below accordingly 
avoids aii the difficulties attendant upon it. 


Part 1. THE THEORY OF RANDOM SUITES 
3. I consider a finite number r of symbols A,, Ag, ..., A,, each of which will 
be called a characteristic, and an infinite ordered series of these characteristics. 
which will be called a suite. For instance, if there were two characteristics A, 
and A, such a suite might be 


A, A, A, AA, 4,4, Ag 


vhere the characteristics appear alternately. Suites exist in the sense that they 


Biometrika xxx r 








2 A theory of randomness 


can be completely specified by a law of formation, as the foregoing example 
shows. 


4. Definition. If the proportional frequency of each characteristic in a suite 
tends to a limit in the mathematical sense, the suite is called “proper”. In the 
contrary case, “improper”’. 

Proper suites exist; e.g. the proportional frequencies of A,’s and A,’s in (1) 
tend each to the limit }. 

Improper suites also exist. For example, it may be shown that if we take the 
kth digit in the logarithms to base 10 of all the integers, beginning with 1, the 
suite of the digits 0-9 so obtained is improper, since no proportional frequency 
tends to a limit. 

Suites also exist which are proper for one characteristic and improper for 
another, provided that there are more than two characteristics. For we may 
build a suite from the logarithm table in the manner just described, and then 
insert a new characteristic @ between successive digits. The proportional 
frequency of Q will then tend to }, but those of the others will not tend to a limit. 
If, however, there are two characteristics, the proportional frequencies, being 
together equal to unity, must tend to a limit together. 


5. Definition. The limit of the proportional frequency of a characteristic in 
@ proper suite is called the probability of that characteristic in that suite. 


6. Definition. By a “Selector” I mean an infinite series of positive integers 
ordered according to their magnitude. A selector, being infinite, must be 
specified by a law of formation, not by enumeration. 

There is a special class of such laws which deserves separate consideration. 
Suppose we have such a suite as this: 


Pere ere re eS 2 ere (2) 


characteristics after the first appearing alternately in pairs. 

Our law of formation of the selector might be in these terms: proceed along 
the series until you come to the combination A, A, A, A,; then choose the ordinal 
number of the next following member of the series, and proceed until you again 
meet that combination; and so on. The series of ordinals so obtained is the 
selector. 

The importance of selectors of this type is that they are mathematically 
independent of the particular characteristic of the member whose ordinal 
number is chosen. By mathematically independent I mean that the value of this 
member does not appear in the law of formation, so that the same member would 
be chosen whatever its characteristic. ; 

Definition. If a selector is constructed from a suite and, in virtue of the law 
of formation, any member of the selector is mathematically independent of the 


in 


ng 
nal 
ain 


the 


ly 
nal 
his 
uld 


law 
the 


M. G. KENDALL 3 


characteristic whose ordinal number in the suite is the value of that member, 
the selector is said to be “disjoint’’ with respect to the suite. 


7. It might happen that a law of formation of a disjoint selector was given 
which did not in fact lead to a selector in the case of certain suites. For example, 
with the suite (2), if we try to construct a selector by choosing ordinals corre- 
sponding to the characteristics following three successive A,’s, no ordinals 
appear. Such a law I should regard as degenerate in relation to that suite, and 
I exclude it from the domain of discussion from this point onwards. Hereafter, 
in speaking of a selector in relation to a given suite I shall assume that the one 
is disjoint with respect to the other. 


8. We may now apply selectors to pick out subsets from a suite. We do so by 
choosing from the suite those members whose ordinals are the numbers appearing 
in the selector. 

Ex hypothesi, the result of this process will be a new suite of the charac- 
teristics (some at least) of the original suite. We may call this a “ Derived suite”. 
Symbolically, denoting the selector by the roman S and the suite by K, we may 


write D=SK. (3) 


I proceed to prove one or two theorems of a negative kind about derived 
suites. 

9. A suite derived from 4 proper suite is not necessarily proper. 

For let K = A, A, A, A, A, A, 4,4 -..; 

Se =a, 2,6 5, 7,9, 14,48, 86, 16, 18, 20,28, cs, 
the numbers running alternately in sets of even and odd, the number of each 
kind being equal to twice the number of preceding members of the selector. 
Then SK = A, A,A,A,A,A, A, A, A; 494_434q494yQ ---- 
However far we go in this series, say to the end of a run of A,’s, there will follow 
twice as many A,’s as there have already occurred of both A,’s and A,’s. Clearly 
the suite is improper. 

10. If we apply another selector S, to a derived suite we get a further 
derived suite which we may write 8,8, K. It is clear that this will not in general 
be the same as 8,8, K. 

The “identical” selector E = 1, 2, 3, 4, ... is of some importance. Clearly it 
reproduces a suite to which it is applied, and ES, K = S, EK, etc. 


Randomness 
11. Definition. If the probability of the characteristic A, in a proper suite 
is p; and if the probability of A, in a proper suite derived from it by the selector 
S is also p; then the suite is said to be random for the characteristic A, with 
respect to S. 
Suites and selectors with this property exist. Every proper suite is random 


I-2 











4 A theory of randomness 


with respect to the identical selector E, and the suite (1) is random for both A, 


and A, with respect to the selector 


though not to the selector RS SOs 9, can. 
12. Definition. A suite which is random for a characteristic A with respect 
to a number of selectors §,, S,, ..., S,, is said to be random in the selector domain 


a ee 


3 

A 2 "> “a 

It is to be noted that if a suite is random with respect to 8, and §, it does not 

follow that S, K is random with respect to S, or 8,K with respect to §,. E.g. if 
4 4 

A, A,4A,A 


2 
2 


K = A, A,A,A 
S 


S, = the disjoint selector obtained by writing down the ordinals of charac- 


teristics next following A,, 


then S,K = A, A, A, Ag 


~w 
een 
~ 
b 


2 —-_. = + © 
so that K is random for A, and A, with respect to both 8, and 8,. Bu 
SS_K A, 4,A, 4,A,A, 


§,S8,K = A,A,A,A,A,A, .... 
13. Given a suite and a certain finite set of selectors, we may consider the 
suites obtained by repeated applications of groups of these selectors. This will 
give us a series of derived suites which may be infinite but is nevertheless 


ordered. If all the resulting suites are proper and the probability of a charac- 


; ll 3 } + 4} nNaroant on . e lad 230 ain 4 
teristic A in them all is the same as that in the parent suite, the latter is said to 
be completely random in the selector domain §,, § S 


There exist suites which are completely random in certain domains. E.g. if 


I A { f 
K A,A,A, A, 4,A, 
S, = 1, 4, 7, 10, ..., 
then S,K = A,A,A,A,4,A, ...=K, 


so that repeated applications of 8, lead only back to the original suite 
Then S,K = A,A, A,A,A,A; 


and S2K = A, A, A, A,A, Ay «... 


Consider further S, = 2, 5, 8, 11 


Thus, any number of applications of 8, and 8, lead either to A, A, A, A, A, A 


429 4 f19 
a4 Ae 
or to A,A, A, A, A, A, and hence the suite is completely random for A, and 
A, with respect to 8, and §,. 
It follows at once that any suite which is derived from a completely random 


suite by a selector of the set is also completely random within the same domain. 





























ide 
I SelT-cor tory } 
elt-co1 ictory, and that they fall 
¥ 4 R } 
once 1+ £ 
concepts t iseé Of 
. 41 . . 
I 3 ¢ rae i u 
fii the ic, se Of what 1s 
t : C} 
ul 1Y r } £ 
hi stance, remarkex 
' ‘ . . 
} hat * nite iar ++} 4 -7 
vnat any suite is naom wit spect t t ientica! I t ® 3. 4 
‘ ranaom with respect to the identical selector 1. 2.3. 4 But 
9 = + max 
iTe.y nay b sulte a i i i i 
: a : ay 1 hy 4 i i i -- SS tne very 
l l 
( eve random , ic a ’ [ wi 
in¢ ( , . 17 7 Sy » } ] 
cmati 1 nh ! ries i »€ vill iC] 
. ; } : 
at { ( t ] 
pa iatel to some tent f Si ial I tl 
a n nis 
point 
15 rTict l 
i tisti wo Ul On ( ? ; 
| i y equ ne y i ¢ vil ] Ta iom 
S¢ ert t t mini I 
] l€ DIN ( I ¥ I 
aot i aun ers OF tine 
‘Y 1x e} 17 lag “_) knoau { , Ty + } 
I i S not the 
| Li) andomne > r [ ‘ fo vs . 
t I U erences m ran i 
: Me wn} y 
Ly ings apou { \ t 
a at u hneiy it ( cre 
fe. ] 
1 } ] ( i al ( | ! 4 
) 1 1! + 
us LT 1 
Cy } . 
¢ aiy 
O1\ t e } : ie 
, { 1 
" : ; 
| y i 
’ } 1? ity ‘ ? 
¥ 
= : ; ’ 1 ' 
t Wh l 
i ) 
dlv n me 
y , ) } 
d 
+ 
il 14 + 
tT 1 ] Wii { i I 
pect om It I I 
| 4 y lo t 
? i ‘ I i 
} . TE Te ee \ 
( bilil } if . 
’ nind in using Y; + } { 
£ > ) ite r ‘ j ) 
Lp | 
I ct VLISE would I } tne | 
n mite } 
A | »] i ‘ , 
+ tha , } 
nd ; ) 








6 A theory of randomness 


Multi-dimensional sustes 
17. As a simple extension of the idea of a suite of characteristics we may con- 
sider suites of sets of characteristics. Such an extension offers no difficulty, and 
is very similar to the transition from describing points on a line in terms of one 
co-ordinate to points in a multi-dimensional space by several co-ordinates. 
We may amalgamate two or more suites into a suite of more dimensions. 
E.g. with the suites A, A, A, A;,A,A_4,g--.; 
B, B, B, B, B, B,B,B,Bs..., 
we can associate the nth member of one with the nth member of the other to 
canes (A,B,) (AzB,) (4; Bs) (A,B,) (A,B) (A,Bs) -... 


~ Convolution 


18. We may also construct an m-dimensional suite by dividing a one- 
dimensional suite into blocks of m. This process is worth noticing. Consider the 


suite A, A,A,4, A, 4,424, 4)..-- 
This is proper and each characteristic is random with respect to the selector 
1, 3, 5, 7, 9, .... 

Now suppose we make a two-dimensional suite by bracketing successive terms, 
thus: (A, Ag) (AgAy) (Ay Ag).(4gA})--- 
This is proper with respect to the two two-dimensional characteristics (A, A.) 
and (A, A,) but it is not random with respect to the selector given. 

Definition. I shall refer to the process of deriving a multi-dimensional suite 
from a one-dimensional suite by grouping sets of successive terms as ‘“‘ convolu- 


tion’’, and the derived suite-will be said to be “convoluted”’. From the example 


given it is clear that a convoluted suite is not necessarily random in the domain 
of randomness of the parent suite. 


Independence 


19. Definition. If a two-dimensional suite is derived from two one-dimen- 
sional suites by attaching one member of the first to the member with the same 
ordinal number in the second; if the original suites and the new suite are proper; 
and if the probability in the derived suite of a characteristic (A,;B,) is the pro- 
duct of the probabilities of A; in the first suite and of B, in the second for all j 
and k; then the two original suites are said to be statistically independent. 

Definition. If from a proper two-dimensional suite there are derived two 
proper one-dimensional suites by ignoring the first and then the second charac- 
teristic of the pairs which constitute the suite; and if these two suites are 


statistically independent, the two sets of characteristics are said to be statistically 
independent in the original suite. 





1e 


are 


ully 





M. G. KENDALL 7 


Statistical independence as thus defined concerns either suites or charac- 
teristics in suites. Like probability it is a property of aggregates, not of indi- 
viduals. 

20. The generalizations of statistical independence to the case of several 
suites or multi-dimensional suites can be made without difficulty. I shall here 
omit them and the theorems which they obey, since all the results are obvious 
extensions of the theory of class frequencies set out for the case of finite classes 
in the Introduction by Udny Yule and myself (1939). The following results are, 
however, worth recalling: 

(a) If K, L, M are proper suites, K is statistically independent of L and 
L is statistically independent of M; it does not follow that K is statistically 
independent of M. 

(b) Three suites are. statistically independent only if the probabilities 
(A;B,C,) are equal to the product of the probabilities A; in K, B, in L, and C, 
in M. It is not sufficient that they should be statistically independent pair and 
pair, as the following example shows: 

K = A, A, A, A, A, A, A, Ag...; 

L = B,B,B,B, B,B,B,B,..., 

M =C,C,C,C,C,C,C,C,.... 
Here, for instance, the probability of (A, B,C,) is zero in the suite obtained by 
associating triads from members of the suites which have the same ordinal. 


Local randomness 


21. A suite as defined above is infinite. I now consider a finite series of 
characteristics which I call a sequence. A sequence may be considered as a 
section of a suite. 

It is evident that any sequence, being finite, can form part of a suite in 
which the probabilities have any given value and which is random in any as- 
signed domain. We may, however, imagine the selectors of the domain applied 
to the sequence, that part of the selectors which contains numbers greater than 
the number of members in the sequence being ignored. Similarly, we can 
convolute the sequence in any way consistent with its size and apply selectors 
to the sequences so derived. We can compare the actual proportions of charac- 
teristics in these sequences with those in any given suite. Any such process 
I call a test. 

Definition. If the proportional frequencies in a sequence are approximately 
what they would be in a suite random with respect to the selectors of the test, 
the sequence is said to be locally random with respect to that test; and so for a 
test domain. 

To make this definition precise it is necessary to consider what is meant by 
* approximately”. Suppose the sequence is of size n, and consider the r” possible 











8 A theory of randomness 


sequences of this size. To each there will correspond a proportional frequency 
under the tests. Choose a number of these, ar”, which may be regarded as 
“approximately” the same as the proportional frequency in the suite. Then if 
the given sequence is one of these, it is “approximately” the same. Clearly the 
word approximately depends on the choice of the number a which corresponds 
to what is generally known as a “level of significance” 

The concept of local randomness, in my view, is important. The series of 
characteristics which we encounter in real life are always sequences, not suites; 
and we have to estimate probabilities and random properties from finite aggre- 
gates, not from infinite series such as form the basis of the theory. Any finite 
series of characteristics whatever is random in the sense that it might arise, how- 
ever infrequently, in random sampling. But in order to make any practical use 
of our theory we have to consider.certain series as non-random, or in other words 
we have to judge from the local randomness of observed sequences. 


ParT 2. APPLICATION OF THE THEORY TO PRACTICE 
Events 
22. Events are the primary data of statistical experience. Every event has 
a number of properties, the conceptual abstractions of the Gestalts which it 
provides. These properties may be called characteristics, and it is with aggre- 
gates of characteristics that statistical inference is concerned. The throwing of 
a die and the growing of a crop on a given field are events. Characteristics of the 
former would include the number which came uppermost, the time at which the 
throw was made, the angles which the edges made with a line fixed in space, and 
so on. In general an event has an infinite number of characteristics. When we 
have a complex phenomenon such as a crop of wheat on a field it is a matter of 


> ! . ] } » 24 » . 
choice whether we regard the whole thing as one event, or look on it as a series 





of associated events, e.g. a collection of crops on a number of square 


the event is to be regarded as including the whole of the happening ana is not 
synonymous with one of its characteristics. A yield of wheat is not an ¢« 
nor is the number thrown by a di 

23. Consider then an aggregate of events. Suppose there exists a finite set 
of characteristics such that each event has one and only one characteristic of the 
set; for example, the events consisting of throws of an ordinary die must have 
one of the characteristics 1—6, according to the number which falls uppermost 
und cannot have more than one. We can then say that the aggregate of events 
gives rise to an aggregate of characteristics. 

24. Now the aggregate f events we meet in experience are always finite. 
[t is true that we sometimes regard a line as composed of an infinite number of 
point I oh bod. nit e number of particies but I € € ental 


e 


M. G. KENDALL 9 


of events. The finite aggregates of our experience can be ordered to produce a 
sequence—in fact they are usually arranged for us by the temporal order in 
which they occur. The fundamental problem of linking theory and practice 
and an analogous problem arises in all frequency theories of probability—is to 
relate the sequences of observation with the suites of theory. 


25. The sequences of observation may be regarded as generated by a 
physical process. The sequence consisting of throws of a die, for example, may 
be considered as defined by the rules under which the die is cast. A sequence of 
crop yields is determined by the circumstances under which the crop was grown. 
I shall assume that the physical process generating a sequence can be depicted 
by a mathematical law defining a suite, in the same way that the “straight 
lines” we draw on paper can be depicted by the straight lines of Euclidean 
geometry, or a rigid body by the abstractions of mathematical dynamics 
Members of a sequence are ascertained by experiment; those of a suite by 


culation. 








9R T ] ok . on AER. aw eS 1 <a } ‘ : nia +3 7 
26. I also take it as empirically established that there are obcervational 
I 
LNNeNn ee h } . ha os 1<é als a ' ; 7. oe: ] >I. > lay InN 
sequences which can be adequately lescribed as lIecally-random sections of 
swites: randam that i in eertai da 4iN< A 1 gcq1n that the nr 2 a 
suites; ranaom nat 18, in certain aqomains 1nd assume nat the process 
° 1 { } fed i} 
generating these sequences will, if continued, produce further sequences h 
11 } ’ } . 1 rt 
re lso locally randox Chi essent t uli § ntine inquu th a ia 
vhich Ss st nea wil tn tC ne t Li it do s ne ve must aiter n 
law: but bef rry t+ th xt} | t } 
aw iu eT WTying u L€ trl we ui i I < sum 
' 1 : , 
that the lav vill hold Put n tnis wa eI Le issun { seems n 
iuistif . 1 ; | X thece wor | 
justinea jut 10 18 Made every m<¢ { ur ilVve in n nese W as i 8 
} y +ha+ y } “at y y 
Y O Nn a t ‘ mn wit € \ 1¢cul Ty c 
€ a i 
mel narkKs on ple € \ i €vo ( s 
Chis much ] ) elled { h« . \ < } 
é ey VhO 1at 7 uty cann De en term i it enc 
I I t { icertaln vl a min¢ l t ‘ i m 
} L 
unnot | I sured bp rot Lit} sil en ( ep 
} 1 41 
( n 141i aoupnt tn l Dp a \ I € ( 
; ] + . \ 
{ y el i l li l i n Wilt , 
loft 17 ? ] ne me Sure pro i bi 1 ~ t t i } 
, 1 4 
) ( ry) ul 4 1) } ( 
iscu 
rhe probability of a characte j S stimat 
f |! | 4 } 
i tn ea sequence »>wW ( t pre S isc i 
] } — t 
I } ( et! is oT esti i ! 1 tne i 1 ri I t 
iS ne pont S y l oOTtal 10 it € lent I )} 
propel f ct Ss. ne t t ) hye } ve 
I 
‘ 1 
; speal 3 suc] 1 C r \ ‘ 1% 











10 


A theory of randomness 


limit of the proportion of heads may be }, but the limit of the proportion of cases 
in which it falls on a white square may be 3. Neither of these fractions is the 
probability of the event. They are probabilities of characteristics and there is 
nothing inconsistent in the fact that they are different. 


Independence 


28. The statistical independence of two suites, or of characteristics in a 
multi-dimensional suite, was defined in paragraph 19, and the statistical inde- 
pendence of observed sequences follows the same line. In statistics the question 
whether two series of characteristics are independent is to be determined purely 
from the experimental data. There may be very good reasons why one charac- 
teristic is “dependent” on another in a causal sense, but if the occurrence of one 
is not accompanied by the occurrence of the other in “unexpected”’ proportion 
they are statistically independent. Contrariwise, there may be no obvious causal 
nexus and yet the two may be statistically dependent. In fact, I would be in- 
clined to deny any separate meaning to “causal”? dependence other than that of 
statistical dependence (with perhaps, allowance for the temporal element). 


29. This point is important in one respect. I have up to the present spoken 
only of the independence of characteristics, not of events, and even of the former 
only in terms of suites or sequences. But in the theory of probability as ex- 
pounded in textbooks it is quite common to meet with such expressions as “a 
series of independent events’, or “‘successive events are independent”. The 
word “event” here means what I call a characteristic; but can we speak of “a 
suite of independent characteristics’? I do not think so. In my opinion the 
concept is equivalent to that of the Irregular Kollektiv of von Mises. 


30. For example, one would be inclined to begin an approach to a definition 
of the concept by requiring that each characteristic was followed equally fre- 
quently by all characteristics of the suite, e.g. that in a suite of characteristics 
A, A,, an A, was followed equaliy frequently by an A, or an A,. But this is true 
of the suite A, A,4_4_4,4,4_42---; 
which is clearly not of. the. type desired. One might then require that each 


characteristic should, in addition, be followed next but one by all other charac- 

teristics in equal amount. But this is true of the suite consisting of repetitions of 
A, A, A, A, A, 4,4, As3.... 

Baffled by continual examples of this kind, one might then require that the 

occurrence of any characteristic was to be independent of all or any of the 

characteristics which have preceded it. This, on analysis, is found to be equi- 

valent to the requirement that the suite shall be random with respect to all dis- 


joint selectors of the type considered in paragraph 6; and this is precisely the 
difficulty of the Irregular Kollektiv. 


M. G. KENDALL ll 


31. I can see no way round this difficulty ; and I therefore reject the suite of 
independent events as I reject the Irregular Kollektiv. This has two important 
consequences, the first concerning the ordinary theorems of probability and the 
second concerning Bernoulli’s theorem. 

The first point may best be illustrated by an example: suppose the proba- 
bility of getting a head with a toss of a penny is }. What is the probability of 
getting two heads with consecutive tosses? Anyone grounded in the classical 
theory would answer “}” without hesitation. Nevertheless, the result is only 
true under certain conditions. In fact the data of the problem are that there is 
a suite of throws of the penny and that the proportional frequency of heads in 
this suite is }. Now such a suite might be (A, = heads, A, = tails) 


A, A, A, A, A, Ag..., 


and the probability of getting two successive heads is zero. But, it may be ob- 
jected, this is an artificial series which would never occur. To this I should reply, 
agreed, but why should there not occur a natural series in which the proportional 
frequency of pairs of heads did not tend to }? It will, I hope, be clear on re- 
flection that there is nothing in the data of the problem to require the answer } 
as'a logical necessity unless we make some additional assumption such as this: 
the occurrence of one characteristic is statistically independent of the occurrence 
of the next. This contains the answer of } implicitly. 


32. In generalization of this problem we might ask: if the probability of the 
characteristic is p, what is the probability that in a set of n characteristics we 
shall get r successes and 7 —r failures? Here, again, the answer of the classical 
theory would be (") p’ (1—p)"-*; and here again the result is only true if we 
assume the statistical independence of sets of n. Clearly if the result is to be 
true for all n we are once more verging on the suite of independent charac- 
teristics referred to in paragraph 30. I conclude that for statistical purposes the 
results of the classical theory of probability are not to be accepted without 
examination. If in any particular case we require one of these results, we must 
be satisfied that the suite we are considering is such as to justify the use of it. 


Bernoulli’s theorem 


33. The well-known theorem given by James Bernoulli in the Ars Contectandi 
is subject to similar limitations in regard to its statistical applications. In 
essence the theorem is a proposition in algebra which may be stated thus: in the 
binomial distribution (p+q)", if wu be the sum of the greatest term and the n 
preceding and n succeeding terms, the ratio of u to the sum of the remaining 
terms may be made as large as we please by increasing n sufficiently. There can 
be no criticism of this result. But, as applied to statistical series, the theorem 





9» } wat vr a 
12 A theory of randomness 


states that if the probability of a characteristic is p and we observe m sets of n 
events, the proportion of sets in which the proportion of successes differs from 
} r . + + | . * . +} le ns aA} 

p by less than e tends to unity with large m and large n. Or put another way 
the probability that tl 











that ti in a set of m differs from p by 
less t} n €é nos t mnits Sy nt lies ii 
i€s§ tnan € vena oO unlit symopoically, 
> 2, A f , 
Pi| »n—aAA) | <e}>l— q; m>M 
j ’ 
of a probability and L need n 
ln ? . + +} . $437 
n iooKIn¢g into the propositi I 
¥ Of assumption considered in 
a cnaracteristic is 7p, the proba 
£ i 
: ? 
. jf) 
DLItV OL hé rat riStics ina et I | } pr" l D) 
hj ] y ho } | . 1] + 
i Vil n e true I ) nately at least eries 
i¢ n ) \ristl tne suite 
. aa 
j 
‘ } ris yt Vv rer 101 l ect ft ne seiect 
j 3. 5.7.9 
nd ri ; l | Pr 
( i t uC tua t ns 1e! Ie ¢ I i s 
4 , 4 
1, A Ly i / 
1 2 
Bert is é ¢ st j A us ) i I t i 
rol ity oF get y l t } r oO i i ind i i 
Aeotia 
b I t S i | i? : 
) v6 { ’ it aris " 
1 ! 
. ] i} t rand 
iad t think ft | tim n 
LOT i ( I i I I 
" 
{ ry 1 ’ } ’ , 
j | ; + 
rr'¢ te 
cw rm ¢ ? } 





M. G. KeEnDALu 13 


universe after drawing if the universe is finite. The abstraction of each member 
may then be considered as an event, whose characteristics may be noted. 
[ assume that there exists for these events a set of characteristics such that each 
event must bear one characteristic and can bear only one. 

We may then imagine this selective process, which I shall call sampling, as a 
generator of a sequence of any desired extent, and to be capable of continuation 
without limit. The result of unterminating sampling would be to give a suite of 
events, to which there correspond one or more suites of characteristics. In 
practice we shall have a sequence only. 


36. Definition. If a sequence obtained from a universe U by sampling is 
locally random for a characteristic A within a selector domain D, then the 
sampling is said to be random for U with respect to A within the domain D; 
and any member of the sequence is called a random sample for the characteristic 
A in the domain D. 

This definition brings out the extremely relative nature of random sampling. 
A method which is random for one universe may not be random for another; a 
method random for one characteristic may not be random for another, even in 
the same universe; and the randomness is aiways relative to the selector domain. 
It is also to be noticed that the sampling process, being physical, can only be 
related to sequences, not to suites. 

The assumption we make in using a random sampling method is that if it has 
in the past generated locally random sequences it will continue to do so in 


similar circumstances. The justification for this assumption is empirical. 


37. In practice we sometimes draw samples one at a time and so obtain a 
one-dimensional sequence. We then convolute this sequence into groups of n, 
making an n-dimensional sequence. But we may also draw the samples in a 
block of n (which I shall call a “‘clutch’’). The difference is of some importance. 
If we ignore the order of the individuals in a convoluted sequence we have what 
is virtually a clutch, and itis very common in statistical work to ignore the order 


in this way. A series of sampling results, for instance, are frequently given with- 


out any indication of the order in which the individual results appeared. It 
should not be overlooked that certain information relevant to the randomness 
of the sample has disappeared in the process. For example, we may be told that 
in a sample of 1000 births 510 were male. We should probably conclude that there 
was nothing in the sample to show that it was not random. But if we know that 
the first 510 were-male we should certainly conclude that it was not. 


38. What are the grounds on which a selective process of experience is con- 
sidered to give random samples? In the first place, as has already been re- 
marked, we can only use a selective process to produce a finite sequence. This 
sequence is always locally random in some domain or other. If we find that, as 
the sequence is increased, local randomness is maintained, we may say that the 











14 A theory of randomness 


method is random for the universe and characteristic considered. But we require 
more for a method to be used in practice. We require to be able to suppose with 
some confidence a priori that it will be random for fresh inquiries, fresh universes, 
fresh characteristics and fresh sequences. And we require the domain of random- 
ness to be as wide as possible. It was formerly the custom to assume randomness 
to any desired extent if there was no obvious reason to the contrary—a sort of 
Principle of Non-sufficient Reason. This is most unsatisfactory, and could only 
be justified if it was found in practice that haphazard methods of selection give 
locally random sequences. In fact we find that whenever any element of 
personal choice is allowed free play, bias is very liable to appear. 


39. It seems to me that we can never rid ourselves entirely of the possibility 
that a method of selection may lack randomness; but we can safeguard against 
the possibility to a great extent. For instance, the method of Random Sampling 
Numbers applied to a universe of names in a directory gives us something near 
certainty (if I may be allowed that colloquial expression) that the resulting 
samples will be random. Furthermore, we can experiment with a method to see 
if bias has appeared. If it has not, we are justified in expecting that it is random 
for the class of cases in which it has been tried. Ultimately, however, the 
assumption of randomness is part of the hypothesis which is being tested. 


40. An assumption which is usually made in practice is that the method is 
random within whatever domain happens to suit the investigator at the 
moment. One draws a random sample from the universe of inhabitants of the 
British Isles. One says that the sample is ‘‘randont” without any qualification. 
Behind this lies the assumption of the Irregular Kollektiv which has been con- 
sidered from a different angle in paragraph 30. A great many statisticians would 
use such a sample to test any hypothesis about the universe which they chanced 
to encounter; they would assume that it was random in regard to height, sex, 
age or any other characteristic; they would assume that it was random under 
convolution; and they would assume the legitimacy of testing in any sampling 
distribution which happened to be convenient. All of this amounts to an 
assumption of randomness in a very wide domain, depending on a subjective 
judgment which may be quite wrong. The wider the domain, the less likely 
(again speaking colloquially) is the assumption to be justified. In practice this 
assumption frequently has to be made, and can be made without much danger 
with a good sampling method. But the greatest danger lies in the fact that the 
person making the assumption very often does not even realize that he is doing 
so. In any sampling inquiry it is necessary to ask oneself, Is the sampling method 
I am using random for the universe I am considering, for the characteristics I am 
discussing, and for the sampling distributions or tests of significance I am em- 
ploying? Randomness is relative. 


on- 
uld 
ced 
eX, 
der 
‘ing 

an 
Live 
ely 
this 
ger 
the 
ing 
hod 


am 
em- 


M. G. KENDALL 15 


REFERENCES 


Doéree, K. (1934). ‘‘Eine Axiomatisierung der von Miseschen Wahrscheinlichkeitstheorie.”’ 
Jber. disch. MatVer. 43, 39. 

KENDALL, M. G, & Basrnecton Smits, B. (1938). “‘Randomness and random sampling 
numbers.” J.R. Statist. Soc. 101, 147. 

—— —— (1939). ‘“‘Second paper on random sampling numbers.’ Supp. J.R. Statist. 
Soc. 6, 51. 

—— —— (1940). Tables of Random Sampling Numbers. Cambridge University Press. 

von Misss, R. (1936). Wahrscheinlichkeit, Statistik und Wahrheit. Springer, Berlin. 

Yue, G. Upny & Kenpatt, M. G. (1939). An Introduction to the Theory of Statistics. 
12th edition. Griffin & Co., London. 














A GEOMETRICAL ANALYSIS OF THE FREQUENCY 
DISTRIBUTION OF THE RATIO BETWEEN TWO 
VARIABLES 


By C. NICHOLSON, M.C., M.A., M.D. 


Tuts subject has already been treated by Geary (1930), and by Fieller (1932), the 
approach to the problem in both cases being algebraical; the geometrical approach 
to the same problem suggested itself to me when I was working on a series of 
anatomical measurements (Nicholson, 1938). This paper could hardly have 
reached publication without the generous assistance of Mr N. L. Johnson of 
University College, London. 


(1) VARIABLES INDEPENDENT 
We are to consider the distribution of the ratio 


yt+ ¥ 
a+ X 
where Y and X are constants, and the joint distribution of x and y is given by the 
normal bivariate surface 


Then if we refer our observed values (y+ Y) and (x+X) to the axes y = 0 and 
x = 0, the co-ordinates of the intersection of the zero values of the variables will 
be --X and — Y, since x+X = 0 when x = —X. The ordinates for a constant 
value of the ratio will lie in a plane surface passing through this point of the 
general form ; 

y =mzr+e, 
where m will have the value of the ratio, and c will be mX — Y. The equation to 


the projection of the section of the normal surface by this plane on the (z, z) 


§ e ill be \ 
plane will be | L(x? (mx +c)?) 
=> exp| =\— + — (1) 
210 ,0, 2\o? 7 


which can be rearranged to 


nx 
| 





| mca |" 
. . E+ —— = — 
z= exp| —3 exp} — | V(mtos + oy) \ (la) 
210.0, 2\+/ (mot + 0%)} - TF y 





/ (m2o2 + 02) 


C. NICHOLSON 17 


The area of this section is the integration of z from —0o to +00, the variable 
(because of the change of angle) being x ,/(1 +m), and this is 


Tae eames Fa™ x - aly ee ®) 


in (2) is equal to 





: mX —Y 
The quantity Jin®o2 +03) Tom Jomtct +08)’ 
the ratio which Geary treats as t. Now m is the tangent of an angle, say /, and (2) 
may be put in the form 


which is the function of 








1 = l-s\a ccos f \"] 
(27) J(e2 sin? £ + 02 cos? £) Pi 2 V(o2 sin? 8 + 02 cos*f)} |” 
Here ccosf is the perpendicular distance from the origin of the surface to the 
plane y = mx +c, so that we can draw the conclusion that a series of parallel plane 
sections of a normal surface making an angle of # with the z axis are normal 
curves with a standard deviation of 
,0y 

V(e2 sin? £ + 6? cos* 8)’ 
while their areas form another normal curve with the variable c cos # and with a 
standard deviation of \(o2 sin? 2 + 02 cos* f). 





(2) VARIABLES CORRELATED 

Clearly, in the case where the primary variables are not independent, it is still 
possible to reduce the distribution to this same geometrical system, so that we 
may discard reference to the primary variables and refer rather to the principal 
axes of the surface generalizing (3) as 

1 
(2m) (a? sin? (a + 0) + 6? cos? (a “i 
x eX nae 
P 3 eran (~ + 0) + b? cos? (a + 4)) \ 
where, referring to Fig. 1, a and 6 are the standard deviations in the direction of 
the major and minor axes respectively of a normal surface; also & is the distance 
of a point K from the origin, K being the focus of a pencil of planes cutting the 
surface, and k being equal to ./(X2+ Y?). The angle KOA is a, which is the absolute 
value of the angle less than 47 between the major axis of the surface and the line 
joming the origin to the intersection of the zero value of the variables; it is 
| tan (¥/X)—6], 
where 6 is the angle between the x axis and the major axis which is given by the 
equation sada PeyFs7y. 
ot—o* 

6 is the angular deviation of any plane from the angle «. which is taken as the 
origin of the pencil, and the value of any ratio will be given by tan (a+4é+6). 


Biometrika xxx 2 











18 Frequency distribution of a ratio 


It is clear that if we were to confine ourselves to the distribution of the ratio 
we must use tan (a +6+0) as the variable, but in this generalized form it is more 
informative to use the angle itself as the variable. The cumulative frequency for a 
deviate @ is the content of the two plane angles PK T and P’ KT", and where K lies 
without the bulk of the distribution the content of the angle PK T will be negli- 
gible. The content of the angle P’K 7” may then be taken as the content between 


G ‘Hy 


> 
R I 
Ss 


H’ 





A’ A 
H 


Pp’ 





Ss’ - iad B’ 


Fig. 1. Projection of normal bivariate surface to illustrate the geometry of 
the ratio between variables. 


the two parallel planes, 7K 7’ and SOS’, the latter passing through the origin of 
the surface, and this is 

—— _ ———EE ce ————e - 

J (27) (a? sin? (a + 8) + 6? cos? (a + @)) 








*ksin#@ [ 1 { Uu - d “ 
«|, °PL-alyaraaara rie aray| = © 
Uu 
If in (5) we put -—_.—_.—__;, —_; —_ x 0 t,, we get 
f in (5) we pu J(a® sin? (a +0) + 6% cos*(a +8) as t, we ge 
ksin#@ 
l Vv (a* sin? (a +0) +6" cos? (2+9)) 1g 

meres edt, 5a 
ol , siti 


where, if the variables are independent 


ee ksinO === (H+ 8) mX —Y¥ 


V(a? sin? (a + 0) +62 cos?(x+6)) — ./(a® sin? (a + 8) + b? cos? (a + 4)) * J(m?o? + a2) : 





so that (5) is identical with Geary’s formula. 


of 


C. NICHOLSON i9 


Sontinuing to regard @ as the variable, the equation to the curve for the 
frequency distribution given by (5) is the moment of the area (4) about K, and this 
can be arrived at either geometrically or by differentiating (5) with regard to 6; 
it is 
1 (k(a@? sina sin (a +) + 6? cos a cos (a + 8))) 
~ al(27/) \ (a? sin? (a + 0) + 6? cos? (a + 8))? 


iz ksin@ " 6 
«on | 5 ities oa ore is 








Here the numerator of the factor within brackets may be put in the form 
k,(a* sin? a + 64 cos* a) sin (a + y + 4), (7) 


where y is the absolute value of the angle which the axis conjugate to POP’ makes 
with the major axis of the ellipse, i.e. where tan y = (b?/a”) cota. It is thus seen 
that the curve is limited, the range of # being from — (a+ y) to7—(a+-y); at these 
angles the value of y is zero. 

It should be noted that the content between the planes SOS’ and TKT” is 
equal to the content of the angle P’KT" minus the content of the angle PK7’, so 
that in the integration the value of the content of the angle PK TF is twice neglected. 
It follows that if we integrate (6) between the limiting values of @ the total amount 
neglected is twice that part of the surface which lies beyond a plane GKG’ which 
passes through and makes an angle of y with the major axis, the value is 


9 


= 


. et dt, 
ns he 


where o, is the standard deviation of the normal curve given by a plane section 
of the normal surface which makes an angle of « with the major axis, i.e. where 





ab 
0, = 75 
* (a? sin? a + 6? cos? a) 





(3) CURVES GIVEN BY THIS EQUATION 


The curves generated by this equation are of very great variety, the majority 
being bell-shaped, and we may now discuss the effects produced by varying the 
constants. It should be noted that, as POP’ bisects the normal surface, the origin 
of the curve is neither the mean nor the mode but the median. 

“k” may have any value from 3o, or 40, up to infinity; as k tends to infinity 
the distribution of the standardized variable # tends to normality with a standard 
deviation of 


1 ‘ 
r /(a? sin? a + 6? cos* a). 


As k decreases in value the departure from the form of the normal curve becomes 
more marked. 


iP) 
iP) 











20 Frequency distribution of a ratio 


“a/b” may have any value from unity to infinity. When the value is unity the 
curve is very similar in form to the normal curve; as the value of a/b increases 
departure from the form of the normal curve increases. 

On the value of « the symmetry of the curve depends; « may have any value 
between 0 and $7. At zero the curve is symmetrical but steeper than the normal 
curve. As « increases asymmetry develops (asymmetry being taken to mean the 
excentricity of the median), the maximum excentricity being reached at a value 
of tan-! (b/a), thereafter the curve returns gradually to symmetry at a value of 











0-10 

0-05 F- 
0 it | | _ 
-80-6 -40 0 40 80 99°4 


Degrees 


Fig. 2. Curve from equation (6). Constants: a=4, b=1, k=3e,, «= 80°. 


kar, where it is flatter than the normal curve. Skewness develops with asymmetry 
but more slowly, and the maximum skewness is not reached until « has a value of 
jn. 

If k is relatively small, i.e. is near 30,,, and a/b is large, and if & is near 47, we 
get a curve with a maximum value on each side of the median, symmetrical when 
a is 47. This distribution does occasionally arise in practice; an example is given 
by Udny Yule (1932). Fig. 2 is an example of a slightly asymmetrical curve of 
this type. 

(4) A GENERAL SOLUTION 

K may very well occupy a position within the bulk of the surface; this must — 
happen when both of the variables have negative values If we can obtain an 
equation of the curve in this case, it should have a general application to all values 


C. NicHOLSON 21 


of k. Geary stated this problem but did not proceed to its solution, Fieller gave a 
general algebraical solution. 
We may consider K as lying on the circumference of the ellipse 
f ¥ 
ae =f B —_ 


az 
where h = k/c,, and the ordinate on the circumference of the ellipse is 


1 : 
a" ”. 
In terms of the primary variables, 
eae. (3 arXY =a) 
Ja=) | 
As before, the frequency curve of the angle @ is given by the positive moment of the 
normal curve (4) about the ordinate at K. This moment may be considered in two 
parts, the moment arising from the portion of the curve without the ellipse (A), 
and the moment arising from the portion of the curve within the ellipse on which 
K lies (B). 
(A) For any normal curve the moment about the origin for that part of the 
curve beyond the ordinate at a given deviation he is 


h?, 


— oa at 
e% ¢,%, 


‘ -_ 
vol xe izle’ da 
J he 


which is equal to Yoore—*. 

That is to say the moment is a function of the ordinate at the given deviation and 
of the standard deviation. Moreover the moment about this ordinate of the two 
equal tails of the curve is equal to the moment of these tails about the origin of the 
curve, so that in our case the required moment is 


e- th* ab 





ay (8) 


(B) The length of a chord of the ellipse which passes through K and makes an 
angle of «+6 with the major axis is 





—_ t-( peace ace | (9) 
(a? sin®(a + 6) + b?cos®(a + )) a | h,/(a® sin®(« + @) + b® cos? (a+6))} }’ 
ksin@ 
¢c i 7 ce j ey ee — —— ——-——-—____—___— l 
and if we make sing ha? sin? (a + 0) + 62 cos? (a +0) (10) 
we may put (9) as 2h cosdo;,.,¢). The total area of the normal curve in this plane 
using (4) is l 





- , e~ tlh sin ¢)* 
a/(277) ./(a* sin? (a + @) + 6? cos? (a + @)) 


so that the ordinate at its origin is 


f 1 e— HA sin ¢)* 
2rab 














22 Frequency distribution of a ratio 


and the area of the portion within the ellipse is 


l s rh cos $0(, . gy 1 v ) 
___p— Hh sin ¢)* - a a9 
——e { exp| 3( |. (11) 


O(a+8) , 
This is multiplied by h cosdo;,,4), 





1 
—___eg— bth sin $)* ¢ — hut m6 
ye O24) cos gf" e~i“du, (12) 
which may be simplified into 
hess ab { 1 
ii 3 4 
m asin? (x +0) + 6? cos? (a + H\" wert bat ris Fr 
+ — (h cos ¢)* + — 7 (hcos$)§+ \. (12a) 
1.335 3.5.7 or ae 
adding (8) and (12a), we have 
eh? ab (, 
a a -(h ina \4 
y nm a*sin®(a+60) +6? cos* (a+) \ cos $)? Tj =3(h cos) 
4 I h ~ \6 eos l } mae Yi | (13 
ie Tha 


as the equation of the curve. This expression converges quite rapidly for values of 
h up to 3; thereafter convergence becomes very slow indeed. 
Reverting to (10), the value of sing may be put in this form 


ab sin 0 (0a) 
7,.2 ~:..9 ‘ 1 3 PS a a ee i ae . Lua 
(a? sin? a + b? cos? a) ,/(a® sin? (« + @) + 6? cos® (a + 4)) eee 
From this the following identities may be established: 


a* sin a sin («+ @) +6? cos a cos (a+) 


cos¢ = = : ; <3 — —— (14) 
? ,/(a? sin? a + 5? cos* x) ,/(a? sin (a + @) + 6? cos? (a + @)) 

d¢ ab ae 

a < 7 re oS {Lo 

dé a sin? (a +6) 4 - B® cos? (a + @) iis 


od = tan—! {(a/b) tan (a + 6)} — tan—! {(a/b) tan a} (16) 


It should be noted that the limit of the integral in (5a) is h sing and that the value 
of ¢ in (16) makes the practical work of calculating a series of these limits very 
simple. If the distribution of @ is still to be regarded as round the 


> median at a, 
6 will have the limits as before, —(«+/y) and 7—(a+y) 


, and at these limits ¢ will 
be + 477; generally, however, 0 may be regarded as bie ing a range of 7 be 


ginning 
at any value, in which case ¢ will have the same range but with a different 
distribution 


A geometrical construction to show the relationship between @ and ¢ i 


is given 
in Fig. 3, where OA and OB are equal to a and b respectively, and iBisa 


right-angled isosceles triangle on AB as hypotenuse 


ee | 


C. NICHOLSON 23 





Here AP = OPA 
in (An — 
and pp = ie, 
so that AP/PB = (a/b) tana; 
by the same reasoning AP/PB = tan AO’P, 
similarly AO’'@Q = tan“! {(a/b) tan (a + 4)}; 
therefore PO'Q = ¢. 


This relationship between @ and ¢ shows that the solution consists essentially 
in referring an asymmetrical system to an equivalent system where the standard 
deviations are equal. 


of 


6 


a 
O A 








Fig. 3. Diagram to illustrate the relationship between 6 and ¢. 


If we now turn back to (6), this can be put in the form 


1 hab (ca? sin a sin (a + 6)+ b? cos a cos (a+ 4)) -4asin g* (17) 
GR VR REE PS ET WE o CeO Pee Tee hee 
,/ (277) (a? sin® a + b? cos* a)* (a? sin? (a + 8) + b® cos? (a + 6))? 


ie 


so that we have the approximation of (6) to (13) 


rehoos dé \ 
1+hcos¢ et? cos | edu}. (18) 


J 0 


ei" dd e—th'dd 


h cos ¢ et 00s $)* = —_— = 
(277) dé : n do 
It will be seen that the last expression may be written 
het Brith 
,/(27) dé dé 


me) 


(1 —hcosg eth cos | e-™du). (19) 


J hoos ¢ 


heos oy) et(h cos g)* 4 


The difference between the values of y given by the two equations (if we do not take 
into account the value of dd/d@) is given in the table below for different values 
of h and for ¢ = $7 and 0, showing that (6) must be a poor approximation to (13) 











24 Frequency distribution of a ratio 


when hf is less than 3, but that beyond that figure the difference rapidly becomes 
negligible. 











Difference between (6) and (13) | 
h 
$= $=0 
1 0-193665 0-066475 
2 0-043084 0-006776 
3 0-003536 0-000305 
4 0-000107 0-000005 

















Fieller’s general solution of the problem is given in his formula (24) 


vi) =4 o,0y\(1—r?) | 57 a(a- 2rd ¢)| 


o? —2rve,0,+v*02 
1 y — va)? 
tee ft er ato — ae 
202 —2rve,c,+ vo? 
0, (TY02—Toy) + vox(TZoy— Yoox) 


0, (TYo, —XOy) + VO ,(1ZO, — Yo) [ %7v\(A—P*) (oy*— 2rvezoy+ v'e2")}* 
n(o? — 2rve,c, + v*o?)t 


|e 














e-*? du. 


v is the value of the ratio under consideration which may be put as tan(«#+6+6), 
and if we reduce Fieller’s formula with its symbols based on the co-ordinates of 
the primary variables to the system with its symbols based on the principal axes, 
we get the following identities: 
— 2rve,0,+ vo? = (1+?) (a? sin? (a +4) +b? cos? (a +8)), 
1—r? = (a*b?)/(o202), 
ZY, ¥Y  kK(a*sin?a +b? cos?) 
ee Or eae 2 22 ~ 
o2 o,0, oc aio? 
(y¥ —v%)* = (1+ v*) k*sin?6, 
6, (r¥o,—Zo,) + vo,(rZo, —¥o,,) = k(1+v*) 
x (a? sin a sin (7 + #) + 6? cosa cos (a+ )), 

so that Fieller’s formula asa whole reduces to 


a + e-tnsings 2h hoosg ioe 
ni+edo~ dO n(1 + v2) Jo 





¥(v) = 








Here d¢/dé = 1+, so that the distribution y(v) of v given by Fieller’s equation 
is equivalent to the distribution of 6 in the sum of (8) and (12). 

The curves arising from (13) are of very great variety of form. They are all 
limited, indeed they are more properly described as cyclical, the ordinates at the 
limits for 6, since ¢ then equals F 37, being both equal to 

e-th*  atsin®? a +b* cos? a 
“a ab(a? sin? + B® cosa)’ (29) 


yf 


C. NIcHOLSON 25 


That being so there is no necessity to regard the curve as beginning and ending at 
the limiting values for 0; in practice it would probably be taken to begin either 
where «+9 is zero or where the ratio under consideration has a value of — 00, i.e. 
where a+6+06 is — 47. The unit deviation in both equations (6) and (13) is the 
radian; for practical work 2/n would probably be used which would require a 
corresponding alteration in the equations. If we continue to regard the curve as 






































Or O15 
h=0 h=1 
0-10 0-10 F 
0-05 0-05 
0 1 1 i 0 i l l 
-45° 0 45° 90° 135° ~45° 0 45° 90° 135° 
0-15 h=2 0-15 h=3 
0-10F 0-10 F 
0-05 0-05 F 
0 } i i 0 \ 1 — 
—45° 0 45° 90° 135° “3° 0 45° 90° 135° 


Fig. 4. Four curves from equations (13) to illustrate the change in form as h increases 
from 0 to 3. The curves commence where «+@=0. Constants: a= 2, b=1, «= 45°. 


drawn with the median at «, we find that when h is zero it is U-shaped except when 
« is small, and is more or less asymmetrical depending on the value of a. With 
values of h about 3 it is possible to produce curves of a highly asymmetrical type; 
as h increases beyond 3 the curve reverts to the bell-shape and the limits recede 
from the bulk of the curve until, as we saw, when h is infinite the curve becomes 
the normal curve and is unlimited. Fig. 4 is an example of the development of the 
curve as h increases from zero to 3. 














26 Frequency distribution of a ratio 


(5) INTEGRATION 


The value of a frequency is the integration of y in (13) with respect to the 
variable @ and, making use of (15), this is 


—~th2 
O16) = = [fr + choosgye+ 77g (hens 





+r (h cosd)* + 55 (h cosg)§ + a d¢. (21) 
An expression for this integral may be obtained by converting the series in 
powers of cos @ into a series of cosines of multiples of ¢, integration then gives a 
series in sines of multiples of ¢; but the functions of h which are the factors in the 
series are very complicated and do not lend themselves to easy computation so 
that it is better to reconvert into a series in powers of cos¢, and (putting m for 
th*) this is 


a1) = 2 Snore (4) 


e-™) + 2(1 —e-™ —e-™ m) cos? 


7 
2.4/ m2 
+ 3. =| l—e*—e*™m— on ) cost 
2.4.6 m? ms “%- 
+ weit Se lO AE a ee 71) cos" + ee. » (22) 


where the occurrence of the terms of Poisson’s series is very interesting. 


(6) CONCLUSION 


The frequency distribution of the quotient of two normal variables may, then, 
give rise to most of the forms which are met with in statistical work. It is not, 
however, suggested that such statistical distributions always arise in this precise 
fashion; at the same time, from geometrical considerations, it seems likely that 
the product of two variables would produce a similar set of curves. It is not 
impossible that a large number of primary variables might group themselves into 
two secondary variables of approximately normal distribution and that the final 
distribution is some function involving either the quotient or the product of these 
variables. However that may be, the fitting of this curve to any given distribution 
appears to present many difficulties and is quite beyond the scope of this paper. 


(7) EXAMPLE 


The following example illustrates the practical use of ¢ in the application of 
Geary’s approximation. Some of the difficulties of childbirth are undoubtedly due ° 


to a disproportion between the size of the foetal head and the size of the bony 


opening through which it has to pass, the brim of the pelvis. This difficulty 


C, NicHOLSON 27 


becomes absolute, i.e. demands caesarian section, in about 1 % of all cases (in 
Guy’s Hospital (1937) ten cases were dealt with by caesarian section on account 
of disproportion out of 990 pregnancies in a fair sample of the population); it 
would be well to know for the purposes of prognosis the percentage ratio between 
the size of the passenger and the size of the passage beyond which spontaneous 
delivery becomes impossible. The foetal head, as it passes, is roughly circular in 
section and the area of the maximum section may be calculated from the biparietal 
diameter; the figures for this diameter are taken from a series of 1010 measure- 
ments by Ince (1939). It has been shown that the area of the pelvic opening is given 
to a close enough approximation by the area of the ellipse on its antero-posterior 
(conjugate) and transverse diameters; the figures for these are taken from a series 
of 350 measurements made by radiology (Nicholson, 1938). It might be well to 
add that the radiological method used (Nicholson, 1936) has a probable error of 
accuracy as low as a millimetre. 
These figures are 


Biparietal Conjugate Transverse 
ee ee ee ee ee 
‘ ie 
Mean (mm.) , 91-5 116-4 132-3 
Standard deviation (mm.) | 4-0 10-5 7-6 
Coefficient of variation 4-4 9-0 5:8 


The distribution of these variables is normal, and the two latter are independent, 


so that we can estimate the following figures for the two areas: 


Foetal head Maternal pelvis 
\y) (x) 
Mean (sq. cm.) 65-8 121-0 
Standard deviation (sq. cm.) 5°8 12-9 
Coefficient of variation 8-8 10-7 


The distribution of these variables is not, theoretically, normal but the error 
from assuming normality will be negligible; we shall also assume that they are 
} 


independent, an assumption which is apparently not unreasonable. We may now 


calculate the following constants for the frequency curve for the ratio: 











28 Frequency distribution of a ratio 
From the tables of the normal curve we get the deviate for a frequency of 
1% as 2-3263, and applying (5a), (10), and (16), we have 
hsing = 2-3263, 
sing = 0-15804, 


d =— 9° 6’, 
tan-! {(a/b) tan a} = 50° 25’, 
tan“! {(a/b) tan (a +6)} = 59° 31’, 


(a/b) tan (a +6) = 1-69879, 
tan (a +0) = 0-764. 

The required percentage ratio is then 76-4 °%. Using the usual approximation 
(Y/X){1 +(o,/X)%}, the mean of the ratio would be 55-5 %, and its standard 
deviation ( Y/X) {(,,/X)? + (¢,/Y)*}, 7-5 %; if we had assumed that the distribution 
of the ratio was normal, we should have got a result of 72-9 %; so that, even when h 
is quite high, the distribution of the tails of the curve may be far from normal. 

The value of the figure 76-4 from the point of view of prognosis is that we can 
now predict that unless an event has occurred, the chances against which are 
99 to 1, a pelvis with an area of 110sq. cm. can pass 99-9 °% of foetal heads, that 
a pelvis of 100sq. cm. can pass 97 %, that a pelvis of 90 sq. cm. can pass 70 %, but 
that a pelvis of 80sq. cm. can pass no more than 21 % of foetal heads. 


REFERENCES 


FIetiER, E. C. (1932). Biometrika, 24, 428. 

Geary, R. C. (1930). J. R. Statist. Soc. 93, 442. 

Guy’s Hospirau (1937). Clinical Report of the Maternity Department. 
Ince, J. G. Hastines (1939). J. Obstet. Gynaec. 46, 1003. 

NicHotson, C. (1936). Lancet, no. 231, 615. 

— (1938). J. Obstet. Gynaec. 45, 950. 

Upny Yots, G. (1932). Theory of Statistics, 10th ed. London: Griffin, 78. 





THE STATISTICAL SIGNIFICANCE OF 
CANONICAL CORRELATIONS 


By M. 8S. BARTLETT 


1. In an important paper published in this Journal, Hotelling (1936) has 
shown that the generalized variance matrix* 
Vu Vis 
Weg Bie. 
Var E V; 
of a vector variate x which has been partitioned into two parts X, and x, with, 


say, g and p components, can, by appropriate linear transformations L,x, and 
L,x, of x, and X,, be thrown into the canonical form 


pee 'LiVieLbi\ /1:R 


\L,V2,L, LVL, R 1 
R is a rectangular matrix which is zero except for a leading diagonal of squares 
A? of canonical correlations. 

Similar operations on the estimated matrix variance V give rise to estimated 
canonical correlations /;, which measure the correlations between estimates of the 
linear functions L,x, and L,x,. While Hotelling has given asymptotic standard 
errors for the coefficients /;, it is known that the significance of these correlations, 
as in the simple case p = 1, is more generally to be interpreted as the significance 
of the regression relations of X, with x,, the validity of any exact tests of signi- 
ficance depending on the supposition that the dependent variate x,, apart from 
its linear dependence on X,, is normal. 

Special cases of the simultaneous distribution of the correlations /;, when x, 
and x, are unrelated, have been considered by Hotelling (1936) and Girschick 
(1939), but an important theoretical advance is represented by the derivation of 
the distribution (under the same conditions) for any values of p and gq (Fisher, 
1939; Hsu, 1939). It will be shown that this distribution makes available further 
possible tests; and since the problem of the most appropriate tests of significance 


* A matrix is usually denoted by a capital letter, and if it has both a population and sample 
value, the population value is given in heavier type (cf. Bartlett, 1939). The transpose of any 
matrix A is denoted by A’. A matrix with only one column is a vector, and is often denoted by a 
small letter. To avoid confusion, a’ vector variate x is written in heavier type throughout, to distin- 
guish it from a single variate z. Ifx is measured from its population mean, the variance matrix V 
is the average value of xx’. In practice we lose one or more degrees of freedom by measuring x 
from sample or regression means, but without loss of generality we shall suppose that our sample 
consists of measurements of x with v degrees of freedom. 











30 The statistical significance of canonical correlations 


has not always been considered very adequately by other writers, it is also the 
purpose of this paper to explain the logical relation of these further tests to tests 
of significance previously available.* 


2. For the case p <q, the distribution of J;, when these roots are arranged in 
order of magnitude, is given by 


P(B, i, ..., 2) ddl... d2, 





where F= Cf {er —Eyeer TT -2) 
i=1 j=i+1 
P T}(v—t+1) 
1 = iP 4 1 
-_ C= wl rig—a4 Fie —g—44 DI —i4 1) (1) 


For p>q, we need only reverse the roles of x, and Xs. 
A criterion which is useful in detecting the simultaneous departure of several 
roots A, from zero is the product 
Pp 


[I (1-2) = A, say.t 


i=1 

When p = 1, the distribution of A is equivalent to that of [7, and the distribution 
in (1) can be transformed if required into Fisher’s z-distribution. When p = 2, it 
was found by Wilks that a similar distribution exists for ,/A. For p> 2, no exact 
test is at present available, but the formula 


x = —{v—- p++ l)}log A, 
with pq degrees of freedom, gives a good approximate test (Bartlett, 1938). 

If the roots Aj, ...,A2, are zero, we are, however, including in A irrelevant 
degrees of freedom which might possibly obscure the significance of A?. For any 
test on Aj by itself, we have little choice but to consider /?, though we do not really 
know whether /? is the root corresponding to A? or not. The probability distribu- 
tiont p(/?) is theoretically obtainable from (1), and hence also 0-05 or 0-01 levels 


* The distribution of 1,2 obtained by Fisher and Hsu has also been obtained by Roy (1939), 
though this writer was concerned with the different problem of comparing the dispersion in two 
multivariate normal samples. For a single variate, testing the significance of a sum of squares 
separated off from the total sum of squares by a multiple correlation or regression formula is 
equivalent to testing the ratio of two variances, a criterion also employed tc test the equality of 
two population variances. Roy has proposed generalizing the latter problem along lines which give 
rise to the same distribution problem solved by Fisher and Hsu, but while he has independently 
obtained the same general distribution, the need for some care in the choice of tests in multivariate 
analysis is even more evident in the problem with which Roy was concerned. It is obvious, for 
example, that the p roots which Roy considered cannot represent all the possible differences among 
the $p(p+ 1) variance and covariance parameters between two p-variate normal samples, and some 
explanation of their interpretation seems required. 

+ This criterion has been proposed by Wilks (1932), Bartlett (1934), and Hotelling (1936), the 
last-named denoting it by z. 

t The probability of a random variable having a particular value x is denoted by p(x). If the 
variable has a continuous range of values, p(x) denotes the probability of the variable falling in the 
interval x, x+ dx. The corresponding notations x | y and p(x | y) are used when the variable is only 
being considered for a fixed value y of another variable. The probability symbol p is not of course to 
be confused with the number p of components in the vector variate x,. 


M. 8S. BARTLETT 31 


of significance of /? for specified values of v, p and q. The tabulation of these levels 
would be useful, but would also be a task of some magnitude, and it is therefore 
worth noting that owing to the problem of identification, the largest root [? is not 
a sufficient statistic for A?, and p(/?) has no unique relevance. If we consider, 
instead, the distribution of /? for given values of 22...2, we have corresponding 
to the probability relation 


pH, ..., 8) = p(2 | &, ...,2) p @, ...,8), 
the probability density relation 

F(ij, ...,4) = fi(@ | G, «.., 1%) fo(22, ..., 2B), 
where the function f,, apart from the constant term f,, is determined at once from 
the function F’. 

In the logical situation we are postulating whereA?, but not the other roots, is 
different from zero, it is not evident which distribution, p(/?) or p(/2| i, ..., 2), 
provides the more powerful test, owing to the absence of sufficiency properties, 
and it is of some interest to consider in detail another problem which is trivial in 
itself, but serves to illustrate the principles involved. 


3. Suppose we have a pair of variates x, and x, both independently following 
a rectangular distribution ots) «de, ere). 


One variate (unspecified) is then shifted a distance a, so that it follows the 


distribution pla) = dz, (a<x<1+a). 


If x, and x, denote the variates in order of magnitude, we shall detect the shift 
a from the larger value, 2,, if « is large enough. To compare the value of p(z,) and 
p(x, | x_), we note first of all that when « = 0, 


p(x) = 2a,dx,, (O<2,<1) 


For the significance level €, p(x,) gives a critical value x, = ,/(1 —€), while p(z, | x) 
gives x, = 1—e(1—2,). If ais different from zero, a peculiar feature (analogous to 
the canonical correlation problem) is that the larger observation x, may or may 
not be associated with a. For p(x,) we find 
(2a,-—a)dz,, (a<2,<1) 
dx, (l<2,<1l+a) 


For p(x,| x), we have 











32 The statistical significance of canonical correlations 
This is provided x, >«; for x,<a, we have 
p(x, |r) =da,. (a<a,<1+a@) 


Using the terminology of Neyman and Pearson, we shall denote the power of the 
test derived from x, by P; and for that from 2,| x, by P’. Then 


e4/(l1—e) 
1-P =| (2a, -—a)dz, 


= 1—e-a,/(1—e). 
For 1—P’, we have first ofall, for given x,, an integral 


1—e(1—z,) 
| p(x, | Le); 


as 
which gives 1—e(l—2,)—a, (%,<«a) 
2(1 =e) (1-2) 
a+2(1—2_) © 
Since p(x, | «) is given by 

{a+2(1—2,)}dx_, («<2_<1) 
di, (0<2,<a@) 

we finally obtain, after averaging over 22, 


(%_ 2a) 


1—P’ = (1—e) (1—a@) — hare. 
Before comparing P with P’, we may remember that we do not expect either x, 
or 2%, | x, to provide the most powerful test obtainable. Theoretically we can see 
what this test would be by considering the ratio 
P(X, X_ | &)/p(xy,%_| 0) = X,, 
say, though since the value of X, is indeterminate unless the true value of « is 
specified, it should be realized that X, does not provide us with any actual test, 
only with a theoretical upper limit for P or P’. 
The criterion X, has the distribution 
X,=0 1 0 
P(X,|a) =a (l—-a)? a(l—a) 
p(X,|0)=0 (l—a)® 2e-a2 
For (1 — a)? >, we shall allow the value X, = 1 to be significant in e/(1 — a)? of the 
times that the value 1 occurs; if (1 — a)? <e, we allow X, = 0 to be significant in the 
fraction e—(1—a)? 


2a — a? 








of the times that X, = 0 occurs. The power P”’ of a test that could be based on X, 
is then ate, [(l—a)®>e], 


a+(l—a)®+ SF | ata), [(1- a)? <e]. 


Xy 
ee 


st, 


he 


X, 


M. 8. BARTLETT 33 
Comparative values of P, P’ and P” are given in Table 1 for ¢ = 0-05 and 0-10. 
Table 1 





38 a 0 0-1 0-2 0-4 0-6 0-8 0-9 








P 0-0500 0-1475 0-2449 0-4399 | 0-6348 0-8298 0-9272 
os| 0-0500 0-1453 0-2410 0-4340 | 0-6290 0-8260 0-9253 
ssi 0-0500 0-1500 0-2500 0-4500 | 0-6500 0-8464 0-9373 





r 0-1000 0-1949 0-2897 0-4795 | 0-6692 0-8590 0-9538 
ool» 0-1000 0-1905 0-2820 0-4680 | 0-6580 0-8520 0-9505 
2 0-1000 0-2000 0-3000 0-5000 0-7000 0-8785 0-9603 





























It will be seen that p(x,) provides « test in this problem rather more powerful 
than p(x, | z,), but that the latter is quite effective. We cannot of course transfer 
this result to our main problem, but it is clear that p(/? | [, ..., 2) may justifiably 
be considered, at least until the distribution p(/?) has been tabulated. 


4. Returning then to this distribution, we may examine one or two special 
cases before formally noting the significance level of /? in general. It has been 
shown by Fisher and Hsu that for v large, the distribution of /7, 13, ..., 1% tends to 


G(mi, m3, ...,m?,)dmidm}...dm?, 


where mi = Jolt, G= CTT {(mgyie-e-De-me TT (mt—mp) 
i=1 j=i+1 ) 
Pp 
and 1/C’ = TL {Peq—i4+1)Fhp—i+ Vp}. (2) 
i=1 


For the particular case p = 2, g = 3, the distribution of m? | m3 is 
(m3 — m3) e~mt-m? dm, 
which is a function simply of m?— m3. If alternatively we consider the distribution 
p(m?), we obtain Qe-™*{e-m* _ (1 — m2)} dm?, 


the 0-05 significance level for which is 5-37. From p(m?| m3) this value of 5-37 
corresponds to a level 0-030 if m3 = 0, to 0-045 if m2 is equal to its expected value 
0-50, and to 0-05 when m3 reaches the value 0-63. These results merely illustrate 
how the significance level of m} depends on which distribution is being used. 
For the case p = 3, gq = 4, the significance level for m? can be written 
u® | 
e“ (w+ 1) 240)” 

where u = m?— m3, v = m3— mj. The level of significance thus depends mainly on 
u, as we should expect, but the effect of v is not negligible. The factor multiplying 
the exponential varies, for example, when u = 4, from 10} for v = 1 to 13 for 
v= 0. 


Biometrika xxxu 











34 The statistical significance of canonical correlations 
The general expression for the significance level for mj is 


re) Pp 
| (m})Ka-P—D e-m* T] (mi — m3) dm} 
m* i=2 





ES Pp 
| (m})Ha-P— e-m* TT (mi — mi) dm} 
: i=2 


or, if we write Pa) = Race edz, 
by 
j { p \ p Pp 
Pn 2p +q+ 1))- \= mi Pn (H{p+q- n+{3 mm Im2(4Lp + 9—3]) 
i=2 i=2 j<i+1 : 











Tna(ilp +9+ 1) — {Emil Py pdtp+g—-1)+{ EE minjl Paaldlp+a—3) 
\i=2 i=2 j=i+2 
F (3) 
For the more general case of finite v, we have similarly for /?, 
J, (eoH-2-9 yee» Fh A) al 
l? i=2 
[coer aay TT Gp de 
li? i=2 
1 
or, if B,(a, 8) = | x*1 (1-2) dz, 
by , 
B,(tlp +9 + 1), 3v»-p—gqt+1))- { 3118, (le +4— 1}, 3[v-—p—q+1])+... 
} . (4) 
Baltp+q+ 1), tv—p—q+1))- { A)B,. (ip+9- 1), {»—p—qt+1))+... 


2 

The dependence of results (3) and (4) not only on v, p and q, but also on the 
particular values of /3, ...,/2,, makes it impracticable to tabulate the 0-05 or other 
levels of significance; but it is not difficult in any instance to find the exact level 
from (3) or (4), using the published tables of I,(a) or B,(a, ).* 

It must be recognized that if the second root Aj is also different from zero,the 
distribution of 1? for given [2 is quite irrelevant, but except possibly when p is 
rather large, it is probable that two or more non-zero roots would be detected by 
the A criterion, and the testing of A? alone by means of [? (a test which is still not 
completely efficient) would not arise. 


5. Directly we have established the existence of at least one rootAj, we may 
always proceed to eliminate this correlation A, and the corresponding pair of 
canonical variates; and analyse the remainder. The theory of eliminating from 
X, a set of specified variates represented, say, by the vector variate x, has been 

* Tables of the Incomplete I-function, ed. K. Pearson (1922, His Majesty’s Stationery Office, 


London); Tables of the Incomplete Beta-function, ed. K. Pearson (1934, Biometrika Office, Univer- 
sity College, London). 


M. S. BARTLETT 35 


indicated by Bartlett (1939).* As a particular case, x, may be a hypothetical set 
of r canonical variates of x,, and the criterion A(v—r, p—r,q) for the remaining 
variate Xo, in place of the original criterion A(v, p,q) for x,, would test the good- 
ness of fit of the hypothetical vector canonical variate x,. In the case g = 1, we 
have the goodness of fit of a hypothetical discriminant function, the problem of 
which was first raised by Fisher (1938). 

It has, however, also been pointed out (Bartiett, 1938, p. 39) that if the canoni- 
cal vector variate X, has been estimated from the data, the symmetrical relation 
between x, and x, will imply that each has only p—r and g—r independent 
components remaining, the x* approximation for the criterion 

A’ = [I (1-8) 
i=r+1 

being —{(v—r)—4[(p—r) + (q—r) + lp log A’ = —{v-—}(p+q4+])} log a’, 

with (p—r)(q—r) degrees of freedom. It was stressed that this reduction of the 
degrees of freedom essentially depends on the existence of non-zero roots 
Az, ...,A2, so that the vector variate x, is well-determined, and any effect of 
selection of [?, ...,7? from /2, ...,/2 can be neglected. Under the same conditions, 
we may approximately use the tests known for p = 1 or 2, for the criterion 
A'(v—r,p—r,q—r), when p—r = 1 or 2. 


6. To demonstrate the reduction in degrees of freedom in the case r = 1, 
consider the case when p is large, and 


p 
—vlog A> > vi?— x?. 
i=1 


If vi? = 0;, the determinantal equation for 0; is of the form 
| A-—OV| =0, 


where V denotes the variance matrix of x,, and A isa matrix of the sums of squares 
and products among the p variates of x, for that portion of the sample separated 
off in terms of the independent vector variate x,. Without loss of generality, we 
shall suppose that V = 1. 

Regarding the v observations for any variate as a vector with v orthogonal 
components, let us now add to the chance variation of the first variate of x, a part 
dependent on each of the q (orthogonal) variates of x,. For each variate of x,, 
the length of the vector representing the first variate of x, will then receive an 
addition X,, say, (k=1...q), which will be of order ,/v. Partitioning off the first 
variate of x,, we obtain, as our new equation for 4, 

| @y, + 22a,X+2X?-O ' a,,+24,X | " 
Q,+22;X a;;—0 | 


* See equation (2.8) of the paper cited, and the immediately preceding equation. 


w 
‘ 
nN 














36 The statistical significance of canonical correlations 


The summation sign is for the q degrees of freedom of x,, and a,; = 2x,x;, where 
2, ...,@, are the p variates of x,. Solving the equation for the largest root, we have 





tam p (Sx,X)? 1 
6, = 2X*+ 227, X e : 
1 + 200, A +44, + 2, > (<x) 


If we neglect the last term, the sum of the remaining roots becomes 
P P 24,X)* 
29,-94,=% {a-Si |, 
which is a x? with (p— 1) (q—1) degrees of freedom. 
7. To illustrate the use of this test we may consider the data from Kelley 
quoted by Hotelling (1936), these consisting of correlations among tests in (1) read- 
ing speed, (2) reading power, (3) arithmetic speed and (4) arithmetic power, the 


sample being one of 140 seventh-grade school children. Hotelling, investigating 
the relation of arithmetical with reading abilities, found canonical correlations 


1, = 0-3945, 1, = 0-0688. 
Since v = 139, p = 2, q = 2, the first correlation gives a contribution to x? of 
— {139 — }(2+ 2+ 1)} log (1 — 0-3945?) = 23-09. 


Similarly the contribution from /,-is 0-64. The x? analysis is consequently sum- 
marized as in Table 2. 











Table 2 
] 
DF. | x 
l, 3 23-09 
l, 1 0-64 
Total 4 23-73 

















it is evident at once, as Hotelling concluded from other tests, that there is a 
significant relation between arithmetical and reading abilities, which arises 
entirely from the first canonical correlation. 


eG 


M. S. BARTLETT 37 


REFERENCES 


BaRTLeETT, M. S. (1934). “The vector representation of a sample.” Proc. Camb. Phil. Soc. 
30, 327-40. 

—— (1938). ‘‘Further aspects of the theory of multiple regression.”’ Proc. Camb. Phil. 
Soc. 34, 33-40. 

—— (1939). ‘‘A note on tests of significance in multivariate analysis.” Proc. Camb. Phil. 
Soc. 35, 180-5. 

FisHer, R. A. (1928). ‘The statistical utilization of multiple measurements.” Ann. Eugen. 
8, 376-86. 

(1939). ‘‘The sampling distribution of some statistics obtained from non-linear 
equations.”” Ann. Eugen. 9, 238-49. 

Grrscuick, M. A. (1939). ‘“‘On the sampling theory of the roots of determinantal equations.” 
Ann. Math. Statist. 10, 203-24. 

Horettine, H. (1936). ‘Relations between two sets of variates.” Biometrika, 28, 321-77. 

Hsv, P. L. (1939). “‘On the distribution of roots of certain determinantal equations.”” Ann. 
Eugen. 9, 250-8. 

Roy, S. N. (1939). ‘“‘p-statistics or some generalizations in analysis of variance appropriate 
to multivariate problems.’ Sank-.yd, 4, 381-96. 

Wits, S. S. (1932). ‘‘Certain generalizations in the analysis of variance.” Biometrika, 
24, 471-94. 

















ON THE LIMITING DISTRIBUTION OF THE CANONICAL 
CORRELATIONS 


By P. L. HSU 


1. The purpose of this paper is to deduce the limiting distribution of Hotel- 


ling’s canonical correlations* under the most general assumption on the popula- 
tion canonical correlations. The result is stated in the theorem at the end of this 
paper. 

The method employed here is essentially the same as that by which we derived 
(Hsu, 1940) the limiting distribution of Fisher’s discriminating components. 
In what follows, steps in the derivation are given while strictly rigorous reasoning 
is left out. The latter may be found in the author’s 1940 paper. 

The parent distribution is represented by the density 


( / p p @ q 
. ™ 1 9 Ts o 
const. exp ) — | xX %;4,;+2> ¥ Bigbdigt X Yor°en)| (P<), -..-2- (1) 
\ \i,j=1 i=1 g=1 g,h=1 
n nr n 
, hao aa 4 poem v € 
where a;; = 2 Xv big = VTL Con = = Yo¥nw eceten (2) 


By virtue of Hotelling’s reduction, the matrix of variances and covariances is 
taken to be 


























1 0 pi 0 0 
| a1 on Bu | 7a ee | 
oo eee cece eccescecesecceeens 0 1 O , Pr 0 
| an Lop Boi Bos _ Pi 0 1 0 0 (3) 
| Bu Boi Vu Vig SOrererrrerrrrrrrrrrr re Serres 
| tee e ee ececenceececsscccenes 0 Po 0 1 0 
| Bry . ‘.. Yai Fun eR Nera ters; 
Or, ey: ae 
where pj, ..-, P’, are the population canonical correlations. The sample canonical 
correlations, 7,, ..., 7», are the positive roots of the equation} 
Way, Tar» by big | 
ia "typ Onn pa tae 8 ae (4) 
by bn Cy, "ig | 
by Pixs 1a 1oq | 


* Hotelling (1936). For further work on the distribution of the canonical correlations, 
see Madow (1938), Girschick (1939) and Hsu (1939). 
+ We use r in (4) instead of —r as in Hotelling’s original definition because it is 


known that the non-vanishing roots of (4) form pairs each of which have the same 
absolute value but opposite signs. 


al 


ns, 


is 
me 











We set Pi = +--+ = Py, = Pv 
i o 
| re =... = Ps= Py 
ie 3 MRS ea (6) 
pS ey a, | oe es so (7) 
PPP 82s FE (8) 
and proceed to find the limiting distribution of 7,, ..., 7, as n> 00. 
2. Lemma 1. We-have the identity 
| a g {7 !S1-1P-OS2R], fore (9) 
This results from the identity 
P Q| I O | =| P—QS“R Q | 
R s |‘ -sor 1 | >| O S| 
on taking determinants on both sides. 
Lemma 2. Let : 
A; = N+ NU; a,;= nu; (t+), 
by = npyt nv, by = \nvjg (6¥g),f wanes (10) 
Cog = N+AN Woy Con = (Nn Wop, ies, 


(j=... +P; g,h =1,...,q). 


The distribution of the u’s, v’s and w’s approach that of 4(p+q)(p+q+1) normal 
variates whose means are zero and whose second moments are specified in the following 
statements: ‘ 


(i) any v or w which has at least one suffix number > p is uncorrelated with all 
the others; 


(ii) any member of one of the sets (U,;,0;;, Wz), ((=1, ...,p), (ez, Vjj, Vj;, Wiz), 
(t, j=4, ...,p; tj) ts uncorrelated with all the members of all the ake sets; 
(iii) for i,j = 1,...,p we have 
E(uz;) = &(w3) = 2, 
E (vii) = 1+p7?, 
E(u,;w) = 297, 
E(U,,0;4) = E(Wyx%z) = 2p; ‘ 
&(ui;) = &(vi;) = E(wi;) = 1 (é+)), 
E (W453 W45) = E(5%;4) = Pips (¢# 9); 
E(U;;0;;) = = E(w; 52 = p; (t+)). 

















40 Distribution of the canonical correlations 


This follows immediately from the well-known central limit theorem.* The 
second moments may easily be computed by virtue of (3). 


CoroLLary. Under the assumption (6) any u, v or w which has at least one suffix 
number >s is uncorrelated with all the others, and &(v?;) = 1 fori = 8+1,...,p. 
This results from (11) on putting p; = 0 fori = s+1,..., p. 


3. We may now find the limiting distribution of r,,,, ...,7,, the p—s smallest 
correlations. 

We substitute (10) in (4) and then divide each element by n. There results 
the equation 


























" ( 1+ us) 1,941 Usp Us1 p+ Uss41 sq 
/ ee 3 ; 
Vn a/n Jn /n Jn In Jn Jn 
TU541,1 MMe+1s of] + Ug+18+1 TUs+1p 54141 Ys+1,8 Verustt = Vas. 
in n (n /n Jn jn Jn 
— ih “pe+l { 1 4 App Ppt “ps Y pst Upa 
/ / 
Jn Jn Jn Jn jn Jn /n Jn 
p+ atl Us1 5411 Vp1 r( 1+ su) Ws 1W,541 TWiq 
1 
J J n J n | J n J n Jn 
Vis , 88 Usi1,8 Ups Wey 1 Wes 1Weo,o41 TWeq 
pe Pst a ry it ] 
Jn Jn Jn Jn Jn, Jn Jn 
Vi s+ U50+1 Ve41,9+1 Ups+t = TMs TWs3+1,8 r( 14 esas) 1Wo+1,4 
Jn Jn Jn Jn Jn Jn \ Jn Jn 
v 1q Usq Usi1¢ Ung TW TW9s 1W4,s+1 { ee Wa 
7 ' ee 
Jn Jn Jn Jn Jn n Jn ain 
oe ree ae (12) 


The equation (12) has p—s roots which are o0(1) for large n. To evaluate these, we 
substitute n-*y for r in (12), delete the common factor n-? from the rows 


8+1,8+2,...,p,p+8+1, p+8+2, ...,p+4, 


* Cf. Cramér (1937), pp. 113-14. 


he 


fixe 


wee eee 


ril +—e 
ain 
2) 
, we 


P. L. Hsv 41 


and then let n->0o. There results the equation 














0 0 0 0 Pi 0 0 0 
0 0 0 0 0 Ps 0 0 
0 0 =F 0 Usita +e+ Usits Uszisia +> Ysitg 
0 0 0 —9 Vy Ups Upset Ung . 
Pi 0 0 0 0 0 0 er ees. 
0 p 0 0 0 0 0 0 
Vi s41 Up 541 Us41s4+1 Vy s+1 0 0 <a 0 
Vig Us Yarre Ung 0 0 0 — 
i.e. a Fee 0 Usiie+1 Vsitg 
0 - v v 
+ wa a Oy ee (13) 
Ustist+1 °*> Up,stt 7 7 0 
Usite@ Ung 0 = 7 
By (9) the left-hand side of (13) is equal to 
dy.1641- 9° dsi19 
(a) ead Eee ee ares (14) 
dq s+1 dog a 
@ . . 
where deg = Yi Vegtsg (8,f=Ot+1,....P),  =—§«-«*—aaenee (15) 
g=s+1 


Let f.1, ---, €, in descending order of magnitude, be the latent roots of the 


matrix ||d;; ||. Then the p—s roots of (12) which are o(1) for large n are 
n*Cit+o(n-*) (¢=84+1,...,p). 
If we define ¢,,,, ..., , by putting 
re= nt} (t=94+1,...,p), = aaveee (16) 


then the ¢’s have the same limiting distribution as the €’’s. Hence the limiting 
distribution of the €; may be derived as the distribution of the latent roots of 
|| d;; ||, in which the v’s are regarded as having a distribution which is the limiting 
distribution described in Lemma 2. By virtue of the Corollary this is the dis- 
tribution of (~—s)(q—s) mutually independent normal variates with zero mean 











42 


Distribution of the canonical correlations 


and unit standard deviation. Therefore* the limiting distribution of the ¢; = nr? 


has the density function 


2--i(p—8)(q—8) 77-8) pike wig et ri —1 { il il (¢; — ‘| 
i=1 i=s+1 j=i+1 
p 4(q—p—}) Dp 
«(1 &) "ox (-4, 3 6), 


coo>,.,2...20520. 


The transformation ¢; = 7? gives the following density for the limiting dis- 


tribution of the 9; = n'r; (i=s8+1,..., p): 
Pp 


$= 


(A, A ot-oa)( it )/7eno(- & 2) 


i=s+1 j=i+1 i=s+ 


falters ost) = 2-9-9 gho-9 "TT PYq@—s—i + VT 
‘ 1 


CO> 95442 --- FY, 29. 


4. We now proceed to find the limiting distribution of rj, ...,7,. 


of (9) we may write the left-hand side of (4) as 
ra-P | C|.|r?A—BC"'B’ |, 

















where 
a4) Ap I] by, «-- Dy || Cu Cig | 
Mok Te “Glekecebinad a eee: ey abo I. 
Ap App bn aa bog Ca Caq | 
Hence, if we set O,=r73 (t=1,...,p), 


the 0; are the roots of the equation 
| BC-*B’-6A| = 0. 
Substituting (10) in (21) and dividing each element by n, we get 


| (A+n-*V) (I+ n+*W)-1 (A’ +n+*V’)— (1+ n-*U) | = 0, 
where 


eossssccsees TL Yo se ff eeceeeccceeee IF YY =e ji eeeccceseses 









































* Hsu (1939), pp. 256-7. 


By virtue 


tice (21) 


asad (22) 


ssseee(24) 


P. L. Hsv 43 


Neglecting higher powers of n-? we write I—n-*W for (I+ -*#W)— and carry 
out the matrix multiplication to the term with n- as a factor. There results the 
equation 


| AA’ + n-*(VA’ + AV’ —-AWA’)—A1+n*U)|=90, —...... (25) 
i.e. 

| PP—O+n4*(2p, 04, —pPwy—Ouy,) — 2-*(P, 21 + 03% 2— P1212 — Fuy2) 

| n*(P} Up, + Pp Yip — PiPp Mip — May) 

| 2-*(P1 21+ P3%2—PjP2Wy2—Ouy2) Pz? — 9 + n-*(2p 3. Vyq — Py? Wap — Fug) 

n*(ps Up2 +; Py Voy — P2P> Wep — Aue) 

| a (p; Up a Pr Vip ag Pi Pin Wi» _ Ou») n-*(p3Uy0 +p, Yep = P2Pp Wep < Ouz,) —_ 

pip? — 8+ n-*( 2p), typ — Pp* Wpp — Ou 
aS. ee ee (26) 

On account of (5) there are ~, roots of (26) which are p+ o0(1) for large n. To 
evaluate these we substitute p?+n~!p,¢ for @ in (26). Since the first ~, of the 
p’’s are equal to p,, there will be a common factor n+ in each of the first 7, rows. 
After deleting this factor and then letting n > 00, we get the equation 


201 — PiU t+ Wy) —C. Vio + Vay — Py(Uyq + Wy) 
Vay T Val —Py(%4,, + Wy,,) 
| Uyg + Vy — Pi(Uyp + Wye) 2092 — Pi(Uaz + Weg) — 


Voy, + U2 — Pi(Uay,, + Map,) 





Vin T Vid = p,(u ly, + W,,) Ven, o Vin2 ah P;(Ue,, + We,,) pe 
9 os >] _ 
“Cn PU pp, + Mp ym) C 
ee See eee (27) 
1.€ Zy—¢ Zin, | 
ci ahaceveudebaane mene ='6, — | 
‘dite } 
Zh o* Se ¢ | 
where Zig = Vig tVje—P (Ugg Wy) (t~J=1,---,My). sees (29) 


Let ¢;,...,¢,, be the roots, in descending order of magnitude, of (28). Then the 
#4, roots of (26) which are p? + o(1) for large n are 

pi+n-*p, Ci +o(n-*) (i=1,..., 4). 
If we define ¢,, ...,¢ 


n, DY putting 

O,= pitta o,f, Gal,....fa) «sores (30) 
then the ¢’s have the same limiting distribution as the ¢’’s. Hence the limiting 
distribution of the €; may be derived as the distribution of the latent roots of 








44 Distribution of the canonical correlations 


the matrix ||z,; ||, in which the w’s, v’s and w’s are regarded as having a distribution 
which is the limiting distribution described in Lemma 2. 

Now, if all the u’s, v’s and w’s are normal variates with zero mean, so are the 
z’s. Owing to the fact (ii) of Lemma 2 all the z’s are uncorrelated. Using the 
formulae (11) to calculate the variances of the z’s, we easily obtain 


E(zi:) = 4(1—pi)?,  &(2%;) = A1—p7® (+9), wee (31) 
(i,j = 1,..-,4)- 
Hence Ziq = 2(1—p2)ty, 2%, = /2(1—p2)t,; (649), sees (32) 


(0,9 - 1, oveghtah 


where the ¢’s are mutually independent normal variates with zero mean and unit 
standard deviation. 


Setting ¢ = 2(1— 2) 7 in (28), we get, by (32), 


t:-9 2-4, ... 2-H. 
2-Hta, tee — 7) 2-4, | — ae (33) 
2-4 2-H - &.-9 


Let 7}, ...,9;,, be the roots of (33) in descending order of magnitude. 
The density function 


(277)-*xnt) exp {— F(t, +... +0 thet. +O ad} cece (34) 
hh 
is equal to (27) -*a'4+ exp | -4> n?) 5 Sainte (35) 
i=1 


which is a function of the latent roots only. Hence* the distribution of the latent 
roots has the density 


Ma -1( wm a 
fini, .-,) = zines (fi ri] | fi | 1 ni-n)hexp(-48 v2), aaa (36) 
i= i=1 j=i i=1 


CO> 7, >... 2, > —O. 


The density function (36) represents the limiting distribution of 7}, ...,7/,,, where 


0, = pi t+ 2n-tp,(l—pi)y, (6=1,...,f). scene (37) 
Hence r, = OF = p,+n-*(1—p?) yi, +0(n) (¢=1,..., a4). 
If we define 7, ...,9,, by 
7, =p, +n-*(1—pi)g, (= 1,...:fr), 5 senna (38) 


then the 9; = n#(1—p})-"(r;—p,) have the same limiting distribution as the. 7}. 
Hence the limiting distribution of 7,, ...,9,, has the density f(7,, ...,7,,). 
In exactly the same manner we may prove that for k = 1,...,» the 


m= M*(L— pR)y (ty — Pe) (C= fy tee that d,s Mate. +My) 
* Hsu (1939), p. 256, Theorem 2. 


it 


re 


’ 
i: 


P. L. Hsu 45 
have the limiting distributionrepresented by the density 


|, a Vnt...+me)? 
where, in general, 
™ —1{m m mm 
S (Xr; «-->L_) = B-+D (it ri) (i i i - x, exp ( =e ; 2) _— (39) 


CO>%2>...2Xy_> — 0. 


Furthermore, the sets (71, .--;%p,)> (Mpgtt> +++> Duytpg)> °° (9 prt... mp—xtt» °°*2 Be) 
are such that the equations corresponding to (27) for two different sets involve 
only mutually uncorrelated w’s, v’s and w’s owing to (ii) of Lemma 2. Therefore 
the limiting distribution must be such that these sets are independent of one 
another. Also, recalling (14) and (i) of Lemma 2, it is seen that the limiting 
distribution of 7,, ..., 7, must be such that the sets (7), ..., My,)> (Muy4t> -**> Durtpg)> “9 
(Myy+..-te 419 +++ Ms) and (7,,1, ---, Jp) are independent of one another. 

In conclusion we sum up the results in the following theorem: 


TuHEorEM. Let the population canonical correlations be pj, ..., P',, where 


, , 
Pi = + = Py = Pv 
, , 
Putt Purine Pe 
, =" — ’ —_ 

Puy+..tpv—ytl pa | ns, Py» 
7, 7 

Port = +++ = Pp = 9, 


P1>Pg>--->p,>9. 
Let the sample canonical correlations be ry, ..., Ty» where 
Ty2>Tg> .:. Sp. 
Let n, = n*(1—p?)*(7,—p;) ((=1,.-.,p). 
Then the limiting distribution of 7), ..., Np» is represented by the density function 


FD +++ NaF Mayra +++ Marta) FO py.tae ee +09 Ms) Si Mera +9 Mp)s 
where the functions f and f, are given by (39) and (18) respectively. 


REFERENCES 


Cramé&r, H. (1937). Random Variables and Probability Distributions. Camb. Univ. Press. 
Girscuick, M. A. (1939). ‘On the sampling theory of roots of determinantal equations.” 
Ann. Math. Statist. 10, 205. 
Hortetxine, H. (1936). “Relations between two sets of variates.” Biometrika, 28, 321-77 
Hsu, P. L. (1939). ‘“‘On the distribution of roots of certain determinantal equations.” 
Ann. Eugen., Lond., 9, 250-8. 
-—— (1940). ‘On the limiting distribution of roots of a determinantal equation.” Proc. 
Lond. Math. Soc. (at press). 
Mapow, W. G. (1938). ‘‘Contributions to the theory of multi-variate statistical analysis.” 
Trans. Amer. Math. Soc. 44, 454. 











THE APPLICATION OF MAXIMUM LIKELIHOOD TO 
DOSAGE-MORTALITY CURVES 


By F. GARWOOD, Px.D. 


1. IntTRODUCTION 


Many papers have been written on the fitting of dosage-mortality curves; in 
particular, a paper by Irwin & Cheeseman (1939) summarizes the methods 
which have been adopted hitherto. It is felt, however, that there is some mathe- 
matical interest in the subject which is worth emphasizing. 

A typical proolem occurs when studying the effect of some drug on a particular 
kind of animal. It is assumed that there is a population of animals, and associated 
with each individual animal is a certain lethal dose of the drug, such that the 
animal would always be killed by a stronger dose and would survive a weaker one. 
There is independent biological evidence for assuming that the logarithms of the 
lethal doses are normally distributed throughout the population, so that if the 
proportion of animals expected to survive a given dose is converted into a probit 
(ie. an equivalent normal deviate + 5), then the above assumption is equivalent 
to stating that the probits are linearly related to the logs of the doses. If the mean 
(or median) log lethal dose is m and the standard deviation is o, then the linear 
relation between probit and log lethal dose is 


Y =a+a, 
where ous and oor. 
B B 
The experimental material consists of k groups, drawn at random from the 
population, of n,, %9, ...,%;,, animals, which are given doses with logs 2,, %9, ..., X,, 


from which there are 8}, 89, ...,8, Survivors, and n, — 8, %— 8s, ...,%,— 8, deaths. 

The treatment which has hitherto been applied to data of this kind consists of 
obtaining from tables the probits y,, y2, ... corresponding to the proportions of 
survivors q, = 8,/N, 2 = 82/N, ..., plotting the y’s against the corresponding log 
doses 2,,%,... and fitting a line to the points, bearing in mind the following 
considerations. 

Since dq/dy = —Z, Z being the ordinate of the normal curve, and as the 
variance of q is PQ/n, Q (= 1— P) being the expected proportion of survivors, it 
follows that the variance of y is PQ/nZ? which in general varies along the line. 
Thus different weights must be used for the various probits in fitting the line. The 
effects of using different methods of calculating the weighting coefficient 
w = nZ*/PQ (reciprocal of the variance) have been compared by Irwin & 
Cheeseman (1939). 


she 
Lp, 
hs. 
of 
of 
log 
ing 


the 
, it 
ine. 
[he 


ent 
 & 


F. Garwoop 47 


A further difficulty occurred in the cases of zero and all survivors, for which 
the corresponding probits are infinite. Fisher’s method (Bliss, 1935), using the 
method of maximum likelihood, overcame this difficulty by replacing the infinite 
probit by a working or fictitious probit in the regression equations. 

Mathematically, Fisher’s exact method of calculation is as follows (see also 
Bliss, 1938 and Fisher & Yates, 1938). Assume rough values a, and 6, in the 


relation Y =a,+6,2. 


For each value of x this formula gives P and Q (the areas of the normal curve 
up to and beyond the probit Y), Z (the ordinate at Y) and w = nZ?/PQ. The 
regression is then found between the variate 





y= Y+tqg=a,tbe+9, = — —  cscos (1) 

Z _Q-4q 
where q=-3 
and x, giving weights w to the former. The result is a new regression equation 

r = as + b.2, 

: _ Swa(y—¥) 
— 2 Sola —2)?? 
and - @=9-67. 


It is to be noted that this form of the regression equation is more convenient 
for our purposes than the form Y = ¥+6,(x—Z). Substituting the values of y 
from (1), it follows that 
Swa(n —7) 


and a, = @,+7—2(b,—5,). 

Hence the changes da = a,—a, and 6b = b,—b, in the regression coefficients of 
y = Y +7 on 2 are in fact the regression coefficients of 7 on x, and they can be 
regarded as the solutions of the normal equations 


daSw +dbSwx =Swy,  ———sanaeeee (2) 
daSwa+dbSwat = Sway, ss aaaeee (3) 


The new regression equation Y = a,+ 6,2 is then made the basis of a similar 
calculation; i.e. the regression is calculated between a, + 6.x +9 and x (the values 
of 9 will be changed since Q and Z are in general altered), giving another equation 
Y =a,+6,2 and so on. The process is continued until no change occurs in the 
coefficients. 

It is possible that some arithmetical labour might be saved by obtaining 
the corrections da, 6b to the regression coefficients a,b at each stage, instead of 
the new coefficients a + da, b+ 4b, by calculating the regression between 7 and z, 
but this has not been investigated. The process of obtaining the corrections is 
illustrated later (Table ITI). 














48 Maximum likelihood and dosage-mortality curves 


In general the successive coefficients a,, and b,, will converge to limits (and the 
successive corrections da, db to zero), and it is not difficult to see that these limits 
are the solutions of the maximum likelihood equations. The process is, in fact, the 
same as the general method suggested by Fisher for solving maximum likelihood 
equations, as will be shown later. Also, a consideration of the foundations of this 
method shows that there is a method of obtaining the maximum likelihood 
estimates which is slightly more rapid (as regards numbers of approximations) 
than that outlined above. 

It is one of the objects of this note to point out that the problem may be 
regarded as one of estimating the parameters m and o (or equally, a and £), 
i.e. of fitting a normal curve to the data. The fact that this is equivalent 
to fitting a line to the theoretical relation between probits and log lethal doses is 
only a consequence of the special nature of the normal distribution. It may happen 
in other applications that the distribution of the log lethal doses is not normal and 
cannot be normalized by any transformation of the log lethal doses but has 
another form depending on unknown parameters; then the problem can only be 
regarded in general as the estimation of parameters from observations. 

In the case of the normal distribution the method of plotting probits is of 
course a very convenient method of representing the data and of obtaining a good 
general picture, but it is to be emphasized that from the theoretical viewpoint it is 
at least equally important to interpret the problem as one of estimating parameters 


as to regardit as a problem of fitting a regression line in the ordinary sense of the 
term. 


2. GENERAL MAXIMUM LIKELIHOOD ESTIMATES 


It will be convenient to recapitulate the method used by Fisher for solving 
maximum likelihood equations (see, e.g., Koshal, 1933). A sample 2, 29, ...,2%,, 
is drawn at random from a population of which the frequency function has a 
known form depending on s unknown parameters @,, 49, ...,9, so that the proba- 
bility of obtaining the sample is 


P (4, Lay --+5 Bq Fy, Oy, .-.5 9). 


If the variates x are independent of each other, as in the case of successive samples 
from the same population, then P is the product of n probability functions 
F(®1; 91, Og, -.-), f (Xe; 4, 9g, -..),.--. On the other hand, they will not be inde- 
pendent if they are a set of frequencies with a fixed total. 

The maximum likelihood estimates of the unknown parameters @,, 4, 
based on the information provided by the sample, are the values which satisfy 
the maximum likelihood equations 0P/00, = 0, 0P/00, = 0, .... Using the likeli- 
Rood fenction L = [Aa,, Sq, ..-; 93, 9g, ...) = log P, . 
the estimates must be solutions of 0/00, = 0, etc. 


F. Garwoop 49 


Since they are functions of the sample, these solutions can be written 
T, = T,(x1, %q, ...), Ty, ..., ete.; if approximations ¢,, t,, ... to T,, Ty, ... are found by 
some rough method, suppose that 7, = ¢,+t,, etc., so that dt,, dt,,... are the 
errors in the approximations. Then we have 


oL 
0 = 30, Ty el 
oL oL el 
— 30, (fv be =) + 8h S59 (las bey v1) + Obs 5559, (ty be ob ere eeeses (4) 


etc., ignoring terms in d¢?, etc. Writing - 
oL 


30, ts, wim => LL; 
oL 
002 (4, te, say = Dy etc., 
the equations become Ly; 8t, + Tyg Sg + ... = — Ih, etc. 
or T. —— 7 (j= <3 eesees (5) 
j=1 


The solutions of these equations can be written 
bd . 
ot; = — x1; L;, (a = | aor 
j=1 
where, in matrix notation, {J;;} = {Z,;}-, the reciprocal matrix of {Z,,}. Thus if 


A is the determinant formed by the elements L,;, and 4,; the determinant obtained 
by omitting row ¢ and column j, then 





l _ 4u l _ Aire l _ Ais 1, = (oss 
li A ? 12 A ? 13 A ) wee ij A ° 
For two variables 
Lge Ty, Ly, 


hater ha=ly=F> ha A = Ly Ly, — Lig, 


The corrections dt will not be exact, since terms of higher order have been omitted 
from (4); however, if the process is repeated with t, + dt,, ¢, + dtg, ..., now used as 


the first approximations, the corrections obtained will be of the next order of 
smallness, and the process can be continued until the approximations ¢ are as 
near as desired to the exact estimates 7’. 

The coefficients L;; are functions of the observations as well as the approxima- 
tions ¢; it is in some practical cases more convenient to replace the z’s by their 
expected values, i.e. the values which they would be expected to have if 4, = ¢,, 


Biometrika xxx 4 











50 Maximum likelihood and dosage-mortality curves 


t,.... Let A,; be the resulting values of L;;, so that equations (5) become 


8 
DT A,, ot; = — L,, (¢ = 1, ...,8)} 
j=1 


8 
with solutions dt, = — DAL, (¢ =1,...,8) 
j=1 


where {A,;} = {4,,}7. 

As befare, the corrected values t + dt can be used as the basis of the new calculation 
and the process repeated to any desired accuracy. It is to be expected that the 
replacement of L;; by its expected value A,, will result in the approximation being 
less rapid; this is confirmed in particular examples by calculations given below, 
the difference being smal], however. 


The method has been used by Koshal (1933, 1939) in fitting a Pearson Type I 
=e y = yo(e—a)n (B—2)m, 
to a set of frequency data. First approximations to the maximum likelihood 
estimates of the four unknown parameters &, £, /1,, M2 (Or 91, Og, 93, 04) were found 
by actually calculating a set of values of Z and estimating its maximum position. 
The above method was then applied. In this case, if « typical group frequency is n 
and the expected proportion is p, we have, S denoting summation over the groups, 


L = constant+ Snlogp, L; = So Po 


n n 
and ang = ~ Sy + Fe 


As before, p; denotes 0p/00; and p;; denotes 0*p/00;00;: The expected value of the 
last term is 


92 
NSp,; = N'%,00,°” = 0, 
where N = Sn = total frequency, so that 
Ay = —~NSPM, 
which were the values used by Koshal. 

The covariance matrix* of the estimates 7 is approximately equal to {A;;} 
(Fisher, 1922); thus the variance of 7; is approximately —A,, and the covariance 
of 7, and 7; is approximately ~—A,,. The degree of approximation is such that 
terms of a higher order in 1/n are omitted. 

To apply the method to dosage-mortality problems, suppose the probability 
of death is a function P of the dose (or log dose) x and of unknown parameters 


* This has been used by Irwin & Cheeseman (1939) to derive the formulae for the 


variances and covariances of the estimates a and b of the parameters in the lethal dose 
distribution. 


he 


F. Garwoop 51 











9,4, .... The combined probability of the given set of survivors is 
n! 
— n—s s 
s!(n—s)! at 
so that L = const. + S[(n—s) log P +s log Q], 
and L,; = se P, 
where g = s/n = observed proportion of survivors; therefore 
n(Q—@) . oe ral 
Ly = 8| P+ mP.B( — +g Pes 
nP; P; 
and A, =-8 PQ ° 


If the distribution of lethal doses (or log lethal doses) depends only on para- 
meters of scale and location, we have 


P = P(a+ fx) = P(Y); 





therefore P,= P'(a+fx)=Z, and P;,=2Z, 
where Z is the ordinate of the ene distribution; thus 
xnZ 
ly = ‘ 42> S — ( —@7). 
Dutt inde SE ce 
Putting 70 (9- -q) =wy and PQ > w, 
we have L,= SC, L, = Sxt. 
Also Dy, = SC’, Lag = Sxl’, Ly = Sxl’, 
: nZ(Q— 1 Z =) nZ? 
ng = SB —+—]|—S — 
i.e. Ris PQ 7 P + Q S PQ 
“eae Oe ee, Ie (6) 
7} i= Zz —_ Z a Z 
where L=3-p O° 
Similarly Dug=—Suzt+Sprt, == veneeee (7) 


Lg = — Swa* + Spx. 
The expected value of ¢ is zero, so that 
A,, = —Sw, Ayyg=—Swx, Ags, = — Swx*. 
The equations for the corrections da,éb to approximations a,,b; to the 
maximum likelihood estimates, using the “‘expected”’ coefficients A, are thus 
daSw +ébSwx = Swn, 
daSwx + dbSwx? = Swxy, 
i.e. the same as equations (2) and (3). Thus Fisher’s exact maximum likelihood 


4-2 











52 Maximum likelihood and dosage-mortality curves 


method of correcting the regression line is exactly equivaient, in the case of any 
distribution defined by parameters of scale and position, to the above method of 
calculating successive approximations to the maximum likelihood solutions. In 
the case of the normal distribution, it should be noted that if a and b are the 
maximum likelihood estimates of a and f#, then % = (5—a)/b and s = 1/6 are the 
maximum likelihood estimates of the mean and standard deviation m and co, 
since it is easy to see that these values satisfy 0L/dm = 0, 0L/d0 = 0. In other 
words, the problem can equally be regarded as the estimation of the population 
mean and standard deviation from the data. 

Furthermore, it may happen from some peculiarity of the experimental 
material that each experiment consists of one item, i.e. n, = nm, =... = 1, and 
the number of survivors is either 0 or 1. For example, the dose might represent 
some quantity which can be measured but not eshtrolled. 

Provided alwix; 5. there i is independent evisénce on which to base the assump- 
tion of the normal distritfetion (or of some ,sther known form), there appears to 
be no reason why such data should not bé effective for the purpose of drawing 
inferences about the population. 

It is true that for the purpose of testing the hypothesis, say of normality, it 
will be necessary to group the data, and this will also be efficacious in obtaining a 
provisional probit line, i.e. first approximations to a and 6; but for the exact 
estimation of the population parameters this is unnecessary (and would, in fact, 
result in a loss of information), for there is no difficulty in carrying out the 
calculations given above, or illustrated later in Tables II and III, with the values 
of g equal to 0 or 1. On the other hand, the exact problem cannot be regarded as 
one of fitting a line to the plotted probits, since all the latter are infinite in one 
direction or the other. 

Another convenient way of regarding the problem of finding the maximum 
likelihood estimates is a geometrical one. We require to find the values a and 6b of 
a and # which are such that L, = 0L(a,f)/da and L, = 0L(a, f)/0f are zero. 
Regarding a, f, y as cartesian co-ordinates in three-dimensional space, we require 
to find the point P(a,6,0) where the two surfaces y = L,, y = L, and the plane 
y = 0 meet. This is the same as the point of intersection of the curve L, = 0 in 
the plane y = 0 (the horizontal plane) and the curve L, = 0 in the same plane. 
If P, (a,,6,, 0) is an approximation to P, the tangent plane to the first surface at 
the point vertically above P. cuts the horizontal plane in a line which is near the 
first curve. 


Using co-ordinates da, 66 with reference to P, as origin, this line has the equation 


oL(a,6;) 0*L(a,5,) 0° L(a,6,) 
. *"-@ Tt? ap ** 


or in the simpler notation 





L, + daly, + 8615 = 


3 


ion 





F. Garwoop 53 


Similarly there is a line near the second curve with the equation 

and the intersection P, of these lines is a closer approximation to P than P,. The 
values of L,,, etc. are given in equations (6), (7) and (8) and the effect of replacing 
them by their expected values A,,, etc. is to replace the tangent planes by planes 


through the points of contact differing slightly in direction, and to replace the 
above lines by lines whose equations are given by (2) and (3). 


3. THE GOODNESS OF FIT TEST 


In the regression line treatment of the problem, the test of the hypothesis of 
normality is provided by testing the residual variance x? about the regression line, 
with degrees of freedom. two less than the number of groups. From the point of 
view we are considering, it would appear more natural to calculate y? as in the 
comparison of observed and expected frequencies, i.e. 


sw ginPY is (n—s—nQ/? 





nP nQ 
n(Q—q)?_ nZ? (Q-q\? 
enaae mis Po (-z") heise: 


with k— 2 degrees of freedom. 
The residual variance about the regression line is 
Sw(y —xda—2x6b)? = Swy?—daSwy —dbSwean, 
so that the two values of y* are identical when da = 0, db = 0, ie. when the 
maximum likelihood estimates have been approached sufficiently closely. 


4. COMPARISON OF METHODS OF SOLVING MAXIMUM LIKELIHOOD EQUATIONS 


— ig 


s 
For convenience we refer to the method using equations } L,;dt; = — L; as 
j=1 


method I and that using the expected values of the coefficients, viz. 
j=1 
as method II. Before comparing them arithmetically, it is of interest to enquire 


whether the two are ever identical, i.e. L,;; = A,;, for a “scaie-location”’ distribu- 
tion. From (6), (7) and (8) this requires that 


Now Z=dP/dY = P’, so that the probability integral P must satisfy the 
differential equation 2 2 
ifferential equ oe ee 

Pp 1-P 











54 Maximum likelihood and dosage-mortality curves 


An integral of P"+P"f(P)=0 
is P’eltPaP — C, 
so that P' = CP(i-P), 
eB+OY 
and hence cs 14 esr . 


Since Y = a+ fx, we can choose values B = 0, C = 2 for the arbitrary con- 
stants, giving 
e2¥ 
~ 1+er’ 
and ordinate Z = P’ = }sech? Y which is the distribution given by Fisher & 
Yates (1938). Thus for this distribution ~=0 and L,,=A,,, Lg = Agp, 
Lgg = Agg; hence equations (2) and (3) will give the most rapid solution. 

Tables for facilitating these calculations have been given by Fisher & Yates 
(1938); it would appear that Tables XII—XIV of this work supply similar tables 
for fitting a distribution of the type P = sin? ¢ = sin® (a + fx); here 

f= —2cot2¢d, w=4n, € =4n(Q—gq) cosec 2¢, 
so that L. = —4N —8Sn(Q—q) cot 2¢ cosec 2¢, etc., 
and any advantage in rapidity gained by using method I is almost certainly offset 
by the simplicity in method IT, since A,, = —4N, A,, = —4NSzx, Ag, = —4NSz2?. 
The point has not been tested, since no examples have come to hand in which 
a P = sin’ ¢ distribution has been envisaged. 
To apply method I to the normal distribution we have ordinate 


Z= ieee -(¥ —5). 
/2 1 
ZZ 
therefore gut—Ts o- - 


The two methods of successive approximation have been applied to each of 
the following three sets of data taken more or less at random from those already 
published for illustrative purposes. 

Example (i). Antipneumococcus serum given to five groups of forty mice; 
Wilson Smith’s data (Irwin & Cheeseman, 1939, p. 179). 


Serum dose c.c. x Deaths out of 40 
0-000625 —2 33 
0-00125 —1 22 
0-0025 0 8 
0-005 1 5 
0-01 2 2 














F. GARWooD 55 
Since the increased dose resulted in more survivors, we call g the proportion of 
deaths. 
Example (ii). Mice injected with Bact. Typhi-murium, sample A; Topley’s 
data (Irwin & Cheeseman, 1939, p. 180). 





| 





| | 

Dose (mg.) % | Survivors out of 5 | 
| 

Sasa | 

0-0625 | a 4 | 

0-125 —2 3 | 

0-25 -1 2 

0-5 0 0 

1-0 l 0 | 

2-0 2 0 

4-0 3 0 





Example (iii). Brine shrimps, Artemia salina, in arsenical solutions having 
concentrations in geometrical progression (Fisher & Yates, 1938, p. 5). 











Solution | 2 | Survivors out of 8 
eae | 
Cc =§ 8 | 
= arf 8 
| E | —§ 6 
F 0 5 
G 1 | 5 
| H 2 | 1 
} I 3 | 0 





In each case the scale of x has been chosen with a central origin for convenience. 
The last example was used by Fisher & Yates (1938) to illustrate the use of a 
table (Table XI) drawn up for solving these problems; as, however, the arith- 
metical work in this note has, for purposes of comparison, been taken to more 
places of decimals than practical work demands, this table has not been used, and 
the requisite areas and ordinates of the normal curve have been taken from tables 
of the normal curve (Pearson, 1930). The results of the comparison are shown in 
Table I. 

Most of the values given are probably exact; there may, however, be an occa- 
sional error of one unit in the last place. The first approximation in Example (i) 
is one used by Irwin & Cheeseman; as it has already been calculated from the 
data, the errors in the coefficients are smaller than in Examples (ii) and (iii). 
In Example (ii) the first approximation was found by fitting a line by eye to the 
observed probits, and in Example (iii) the first approximation was found by a 








56 





Maximum likelihood and dosage-mortality curves 


Table I. Comparison of methods of successive approximation 
to solutions of maximum likelihood equations 





Example 








(ii) 





(iii) 















































Approximation Successive corrections to a and b 
Method IT Method I 
First Final 
da 6b da 6b 
Y = 5-5451+ 0-66892 | Y = 5-5544+0-676127| 0-0095| 0-0074|) 0-0093) 0-0071 
: —0-0002 | —0-0002|} 90-0000; 0-0001 
Y=7+2 Y =6-5168 + 0-8635a2 | —0-6475 | — 0-2034 | —0-6133 | —0-1952 
0-1350} 90-0530] 0-1190;  0-0533 
0-0267| 0-0127| 0-0110|  0-0053 
0-0024| 0-0011|} 0-0001/ 0-0001 
0-0002| 0-0001 — — 
Y =4-6+ 0-62 | ¥ =4-5658+0-71282 | —0-0259/ 0-0891| —0-0268)  0-0984 
—0-0080| 0-0215| —0-0072} 0-0142 
—0-0003| 0-0021| 0-0002}  0-0002 
— 0-0001 — — 








Method II. Expected values of d*/da*, etc. used. 
Method I. Actual values of é*Z/d@a?, ete. used. 


preliminary regression calculation with the coefficients rounded off to one place 
of decimals. The arithmetical details of the calculations (Example (ii)) of the 2nd 
approximation by the two methods are given for illustration in Tables IT and IIT. 

It is seen from a comparison of the results that there is a slight advantage, 
from the point of view of rapidity, in method I. On the other hand, method II 
entails a little less arithmetical work, and if Fisher & Yates’ tables are used it is 


probable that this advantage would be greater, although the point has not been 
investigated. 


rT le | oo tend | 








Table II. Typical calculation of corrections to estimetes of 
parameters, Example (ii), method I 


ist approximation, ¥ =7+z 















































| Dose zinis pg Pe ' Zz - az Ts 1/2 (FO) —t=p| C=; ae 
te q\y_5 Q* @-a | 7 (Q—-at\ “5g a a a 7) 
-3 |5|4|08|—1 | 0-15866 | 0-84134 | 0-24197 | -0-04134 | 18126 | 0-07493 —0-2374 | —0-45638 
(0125 |-2 | 5| 3/06} 0 | 050000 | 050000 | 0 —0-10000 | 15958 | —0-15958 ~0-63663 
02 |-1 [5/2/04] 1 | 084134 | 0-15866 | 0-24197 |—0-24134 | 1-8126 | —0-43745 02374 | —0-54245 
| 05 0 |5|0/00] 2 | 097725 | 0-02275 0-02275 | 2-4285 03180 | —0-11355 
| 10 1 |5/0|00| 3 | 099865 | 0-00135 | 0-00443 | 0-00135 | 3-2874 0-2785 | —0-01333 
| 20 2 |5|0/00] 4 | 0-99997 | 0-00003 | 0-00013 | 0-00003 | 42200 |  0-00013 0-2200 | —0-00054 
| 40 3 |5|0}00)] 5 | 100000 | 0-00000 | 0-00000 | 0-00000 | 5-1900 0-1900 | —0-00001 
p= j 








2nd approximation, Y = 6-3867 + 0-8048z. 


L, = SE = — 0-46228, 
Lag = SC’ = — 11-7629, 
Lg Lg- L, Leg = 
Liza Lgp- L,,° 


da = 


Lg = Sxl = 0-53652 


Lug a Sxl’ = 3-1704, Lgp = S22C’ = —7-2120 








Es. Ee 
—0-6133, 6b = —2—2f "A “ea — _ 9.1952 
aa Lpg— Lag 
a= 7 b= 1 
a+éda= 6-3867 b+6b= 08048 














Finai approximation, Y = 6-5168 + 0-8635c. 


Table III. Typical calculation of corrections to estimates of 
parameters, Example (it), method II 


lst approximation, Y =7+<z 


















































Dose | zz | on | 8 q — = Q* Zz Q-4 »- 951 wt wrt 
(mg-) | * = 
00625 |-3 | 5 | 4 | 08 ly 0-15866 | 0-84134 | 0-24197 | 0-04134 | 0-1708 | 0-4386 |-1-3158 | 
0125 |-2 | 5 | 3 } 06 | O | 050000 | 050000 | 0-39894 |-0-10000 |-0-2507 | 0-6366 |—1-2732 
0-25 -1 | 5 | 2 | O4 | 1 | 084134 | 0-15866 | 024197 |-0-24134 |-0-9974 | 0-4386 |—0-4386 | 
05 | oj} 5 | 0 | 00 | 2 | 097725 | 002275 | 005399 | 002275 | 0-4214 | 0-1311 | 0-0000 | 
1-0 | 11 5 | 0 | OO | 3 | 099865 | 900135 | 000443 | 000135 | 03046 | 00146 | 00146 | 
2.0 | 2] 5 | 0 | 00 | 4 | 099997 | 000003 | 000013 | 0-:00003 | 02369 | 00006 | 00012 | 
4-0 | 3 5 0 0-0 5 1-00000 0-00000 | 0-00000 | 000000 | 0-1928 | 0-0000 0-0000 
Swan = 0-5366 Swa* = 6-9494 Sw = 16601 Swz = —3-0118 
—ZSwy = —0-8387 —Z#Swx = 5-4640 % = — 1-8142 
Swy(x—#) = —0-3021 Sw(a—Z)? = 1-4854 Swy = — 0-4623 
Swy(x- 4 ‘i 
a 7j = —0-2785 
Sw(x —Z) 
—zdb = —0-3690 
e= t da = —0-6475 
b+6b= 0-7966 a= 7 
a+éu= 6-3525 


of decimals are shown in the table; similarly in Table IIT. 


2nd approximation, Y = 6-3525+ 0-7966z. 


Final approximation, Y = 6-5168 + 0-8635z. 


* Five significant figures were used for P, Q and Z, where possible, but only five places 


+ As n= 5 in each sample, it has been omitted for convenience from ¢,¢’ and from 
w, wx in Table ITT. 











Maximum likelihood and dosage-mortality curves 


5. SUMMARY 


The usual practical maximum likelihood treatment of dosage-mortality 
problems (consisting of the transformation, of percentage surviving into probits, 
adjustment and weighting of the latter, and calculation of successive regression 
ines) is shown to be equivalent to calculating successive corrections to the re- 
gression coefficients. The process is exactly equivalent to the method, given else- 
where by Fisher, of obtaining the maximum likelihood estimates of the para- 
meters defining the distribution. A refinement of this method, using the actual 
values of the second derivatives of the likelihood function, instead of the expected 
values, converges a little more rapidly when applied to the normal distribution, 
but this advantage is offset by some extra arithmetical work. The two methods 
are exactly equivalent only for the distribution specified by P = }sech*z. 


The writer is greatly indebted to Mr E. D. van Rest and to Prof. R. A. Fisher 
for much useful help and advice. 


REFERENCES 


Buss, C. I. (1935). Ann. Appl. Biol. 22, 134. 

- (1938). Quart. J. Pharm. 11, 192. 

FisHer, R. A. (1922). Philos. Trans. A, 222, 309. 

FisHER, R. A. & Yates, F. (1938). Statistical Tables for Biological, Agricultural and Medical 
Research. Edinburgh & London: Oliver & Boyd. 

Irwin, J. O. & CHEESEMAN, E. A. (1939). J. R. Statist. Soc. Suppl. 6, 174. 

KosHat, R. 8S. (1933). J. R. Statist. Soc. 96, 303. 

—— (1939). Ann. Eugen., Lond., 9, 209. 


Pearson, K. (1930). Tabies for Statisticians and Biometricians, Part 1, 3rd ed., Biometrika 
Office. 





al 


ka 


59 


A NOTE ON FURTHER PROPERTIES OF 
STATISTICAL TESTS 


By E. 8S. PEARSON 


Dr P. L. Hsv has suggested that I should write a short introductory note on 
the origin of the idea involved in his paper and in that of Dr Simaika’s which 
follows.* In searching some twelve years ago for a systematic method of choosing 
the best test of a statistical hypothesis H,, Prof. Neyman and I came to the 
conclusion that an essential preliminary to any mathematical formulation of 
the problem was the definition of a set of admissible alternative hypotheses, 
C(H). Starting from this viewpoint, our first method of selecting a test involved 
the use of the likelihood ratio, but, however useful as a practical method of 
attack, the principle underlying this approach was somewhat arbitrary. A more 


‘ fundamental procedure, later developed, was to choose a test paying regard 


to its power function, that is to say, to the chance that its use would lead to the 
rejection of H, if an alternative H #4 H, of C(H) were true. It then appeared 
that a number of statistical tests in common use had the remarkable property 
that they maximized this chance for every alternative to H, in C(H). Such 
tests were termed uniformly most powerful tests of H, with regard to C(#). 
That there were limitations to the situations in which a uniformly most 
powerful test could exist soon, however, became clear. These limitations were 
gradually explored, and the following papers are further contributions to the 
subject. It was found that these tests generally, though not always, con- 
berned the value of a single parameter. Such are tests of the hypothesis that 
& mean or a standard deviation has a specified value, or that the difference 
between two means or two standard deviations 


is is zero. Further, in these cases 
the class of alternatives must be restricted; th 


us the two-sample f-test of the 
hypothesis that two population means , and £, are equal, is only uniformly 
most powerful for the situation in which the alternatives considered are defined 
by £,—&>0 or by €,—& <0 but not for both at the same time. 

In this connexion, in 1935, Kolodziejczyk was able to prove that for tests 
of a linear hypothesis, no uniformly most powerful test could exist if the 
number of parameters involved was greater than unity. This result was im- 
portant, since the majority of tests used in the analysis of variance can be 
reduced to tests of a linear hypothesis. 

This limitation of tests regarding the value of two or more parameters can 
be illustrated by a geometric presentation. Since the most important features 
of the problem can be illustrated when H, is a simple hypothesis concerning 
the value of two parameters, I shall take this case, using notation already 
adopted in this connexion. 


* See pp. 62-69 and pp. 70-80 below. 











60 Further properties of statistical tests 


Suppose that the elementary probability law of random variables z,, ..., 2,,, 
whose particular values are given by observation, is of form 


(1, «++» Tn | 1, 92) = p(E | A,, 43), (1) 


0,, 6, being the two population parameters. For a critical region w of size a 
associated with a given test, we may write 


P{Eew | 6,, 4} -|...[ pa | 9,, 2) dar, ... da, = B(A,, A, | w). (2) 

If the hypothesis H, which w has been selected to test assumes that 
0,=G, 0,=62, (3) 
then A(O2, 83 w) = a, (4) 


where « is the significance level chosen. 

A power surface may be obtained by taking rectangular axes for 0, and 0, 
in a horizontal plane and plotting £(@,, 0,|w) as a vertical ordinate. If w, were 
a critical region associated with a uniformly most powerful test of H,, then its 
power surface would fall nowhere below the surfaces derived from other critical 
regions satisfying (4). No unique surface with this property will, however, in 
general exist. If, for instance, we choose w, so that the surface will rise quickly 
in the direction parallel to the axis of #,, we shall reduce the rate of increase in 
the direction of 0,, and vice versa. Power surfaces of alternative critical regions 
may, in fact, cross one another in a complicated way, but no single surface can 
everywhere lie above all others. If we confine attention to tests for which the 
power surface has a minimum ordinate of « at the point 09, 68, i.e. to unbiased 
tests of H,, we shall still be unable to find a uniformly most powerful test in 
this restricted field. 

The difficulty in choice between alternative tests can, indeed, only be solved 
by a further formulation of the requirements of a satisfactory test. Several 
lines of attack are open: 

(i) To lay down conditions for the form of the power surface in the neigh- 
bourhood of the point 68, 68. Here we may describe the objective as to make as 
large as possible the chance of detecting small departures in @, and @, from the 
values specified by H,. A method of approaching the problem from this point of 
view leads to the development of the unbiased test of Type C (Neyman & 
Pearson, 1938). 

(ii) To regard it as of more importance to control the form of the power 
surface at some distance from its minimum point; for example, to try to select 
a critical region for which the power surface reaches the level 


B(O,, A,| w) = 0-95, (5) 


along a contour lying inside the corresponding contour associated with any 
other test. This method of approach has been examined by Dr B. L. Welch, 


E. S. Pearson 61 


but his results are not yet published. It is possible that methods (i) and (ii) will 
lead to the same result. 

(iii) To consider whether from the practical point of view, if H, is not true, 
the importance of the departure of the unknown parameters from 6%, 6% can be 
measured by a single parameter, 

A = f(A, 92). (6) 

If this is so, we are in fact defining a system of contours on the @,, 6, plane 
along any ofte of which we should like the ordinates of the power surtace to be 
constant. Such a system would be defined, for instance, by 

A? = (0, — 99)? + (8, — 92)", (7) 
and if £(0,, 6,|w) is to be constant for values of 6,, 0, satisfying (7), the contours 
of the power surface will be circles of radius A. Alternative tests would then be 
confined to those whose power surfaces had circular contours, H, would be the 
hypothesis that A = 0 and the uniformly most powerful test, if it exists, would 
be that for which B(A| wy) > B(A| w) (8) 
for A> 0 and all alternative critical regions w satisfying the conditions stipulated. 

The problem thus presented in the case of a simple hypothesis concerning 
two parameters will arise in similar form when H, is composite and concerns the 
value of many parameters @,, 9, ..., @,. In a number of multivariate problems 
we have reached a position in which: 

(a) tests of statistical hypotheses concerning the values of several population 
parameters have been derived, as well as their power functions; 

(6) these power functions have been shown to depend on the value of a 
single function A =f(O,, 9», ..-,4,) 
of the parameters considered. 


In the following contributions Dr Hsu and Dr Simaika have examined three 
of these tests, that concerned with the general linear hypothesis, with Hotel- 
ling’s generalized 7" and with the multiple correlation coefficient. They have 
shown that of tests whose power function depends only on a certain function A 
of the population parameters, the existing tests are the uniformly most powerful. 
It is of course true that in the problems in question no alternative tests are at 
present available or indeed likely to become so. Nevertheless, I believe that the 
discovery, resulting from Dr Hsu’s initiative, of the relationship between the 
test function and a corresponding comprehensive collective character in the 
population, has taken us a step farther in our understanding of the properties 
of statistical tests. Further, this relationship between EH* and A, T? and y*, 
D*® and A*, R? and p? seems-to lead us round by another route to the problem of 
statistical estimation. 

REFERENCES .- 


KoztopzirEsczyk, Sr (1935). Biometrika, 27, 161. 
NryMan, J. & Pearson, E. S. (1938). Statist. Res. Mem. 2, 25. 











62 


ANALYSIS OF VARIANCE FROM THE POWER FUNCTION 
STANDPOINT 


By P. L. HSU 


A FRESH study on the classical analysis of variance tests in the light of the 
Neyman-Pearson theory was started by Kolodziejezyk (1935), who formulated 
the class of linear hypotheses for which these tests may be employed. Asa linear 
hypothesis is defined relative -to the set of admissible hypotheses, the study of 
the E?-test (by which we denote any test falling under the usual methods of 
analysis of variance) may be made with reference to its power function. P. C. Tang 
(1938) showed how the power function was related to R. A. Fisher’s (C) distribu- 
tion (Fisher, 1928) and so was able to appraise the chance of detecting the falsehood 
of a linear hypothesis using the H?-test. The great theoretical value of the power 
function lies, however, in its use in comparing the relative merits of alternative 
tests of the same hypothesis. In this paper we shall prove a theorem (p. 63) 
which asserts that out of a certain class of tests the H?-test is uniformly most 
powerful. 

In his paper Tang has used an orthogonal transformation in the sample space 


which enabled the general linear hypothesis to be reduced to the following simple 
form: Given the elementary probability law 


. l ™m n 
PUY <0 Y Bas oes2n) = (YC2m) OFM exp | — 5 3{ X= (y¥s—1:)? + 3 4)}, (1) 
207 \i=4 Pine Mid | 


where all real values of 7;, ..., 7, and all positive values of o are admissible, the 
hypothesis is that n, (<m) of the 9’s have the true value 0: 


Th = Ia = +++ = Mn, = 9. (2) 
We call the above hypothesis H). 
We shall set 


m% ™m n 
B= Sut/(Sv+ Ea), (3) 
and call w, (of size ¢) the critical region for the rejection of H, defined by the 
inequality 
E*> EF?, (4) 


where E? is a constant so determined that the probability that (4) is true, given 
that (2) is true, equals e. 


The power function of wy as given by Tang can be written 


eS {hl B(dn, +h, 4n)} an (HE? )ia tht (1 — B®)int d(H), (5) 
A=0 E? 


(5) 


P. L. Hsu 63 


és 2 
where A= 203 = Ni (6) 





An outstanding feature of the power function (5) is that it depends on the single 
parameter A. Our problem is, does there exist another critical region of size € 
whose power function depends on A alone and which is more powerful than w, for 


certain values of A? The answer is contained in the following theorem and is in 
the negative. 


THEOREM. Suppose that the critical region w satisfies the following conditions: 
(a) wis of size €, 
(b) the power function of w depends on the single parameter X. 
Let f(A) be the power function of w and f(A) be the power function (5) of wo. Then 


BA) < BofA) (7) 


for all positive values of A. 


Proof. In the place of z,, ...,z,, we substitute spherical co-ordinates, viz. the 
radius vector r = (2z?)? and n— 1 angles, 4, ...,9,,_,. We deduce from (1) that 


PUY a> +5 Ys ys +++ In—ay 7) = P(Yas +++ Yny) PYnyrs -++> Ym) P(A --+> na) DCT) 


---)9,_3) P(r), 
(8) 
where PVD» 1%) = (2m) o}-exp | 33 (Yen, (9) 
PUY ny 419 +++3 Ym) = {y (277) a}? exp =~“. E s—n : (10) 
=n 
—i(n— —n —-lyn-le 1 
p(r) = 2-9) o-"(P'}n)- texp(-3a2"'), (11) 
and p(4;,...,9,_,) is the well-known product of cosines which involves none of 
the parameters 7;, ...,7,, and o. 
We now make the following successive transformations: 
™m 
r,=st, s=t—-Sy?, y,=tu, (s=1,...,2)), (12) 
i=l 
and also write ¥%=o 9, (t=1,..., 2). (13) 


It follows that 
PUY n+ 99 Ym: 6, +009 Iy_as Uy, sees Unys t) 


= PUY n+ +++) Ym) P(A, — A,-1) P(u, Te Uny t), (14) 
where 


3 t 
P(Uy, «-+y Un,» t) = (20) +) 2-44 4n)-1 ec +9 Exp ( - 55) 


Cc 


TM 4(m—2) Jt m% 
x ( - 2 u?) exp (x 70) (15) 
i=] i=1 











64 Analysis of variance from the power function standpoint 


From now on we shall write y for the set of variables y,, ,,, ..-, Ym and dy for 
dy, 41) ---»dYm, and use similar abbreviations 0, u, d0 and du. 

"Suppose now that the critical region w satisfies the conditions (a) and (6). 
Let I'(y,0,u,t) be the characteristic function of w, i.e. I'(y,0,u,t) = 1 or 0 ac- 
cording as the sample point falls within w or not. Let W be the sample space. 
Then. the power function of w is 


BA) -( I(y, 6, u, t) ply, 9, u, t) dyd@dudt, (16) 
Ww 
whence, 


(/2 ayn | ry, 6, u, t) ply) pA) ti(n+n,—2) 
Ww 


t ™% 4(n—2) 
x exp( - 5,2) (1 = 2 ut) dyd@dudt = mi™I(4n)e, (17) 
20 ~ t=1 
(yae)-memo | Ty, 8,158) ply) 2(0) torrm-® 
Ww 
t nm s +(n—2) . Jt nm 
x 6xp ( 7 acl (1 sa Eu) exp (¢ Zui ) dyd6dudt 
t= = 
= nim P(jn) f(A) = FQ) = F(Z vt), say. (18) 
i=1 
Let W, be the sample space of 0, u and t, and put 


(./2 ao) mo | Ty, 6, u, t) pA) ti(n+n,—2) 
WwW, 


ees ae ae ee 
P 202 ee P o i," ; 
— F(A) = dy, y,a). (19) 


Then, by (18), [7 [7 se) pw dy = 0, (20) 


«© oo l m 
i.e. oq P(Yny429 -+°9 Yims Yo +++ Yay a)exe(—35, >> st) 


=m+1 
m 


<exp( 5, 2Ys) Anan + Bm = 0, (21) 


i=n, 
on writing a, for (20*)-!y, ((=n,+1,...,m).. 

Equation (20) must hold true for all real values of the a’s. Hence it follows 
from the well-known theorem on Laplace transformation* that 


OY, s)exp( 355 4 i) = 0, 


i=n,+1 


* Cf. Doetsch (1937), p. 35, Theorem 1. Though the theorem referred to is stated for the case 
where the number of y’s is one, it may easily be extended to the case of more thar one y by 
induction. 

















P. L. Hsu 65 
i.e. O(y, y, 7)-= 0, whence, by (19), 


(./2 0)“ +"y I(y, 6, u, t) p(O) t+m-2) 
Wi 


x exp ( - 553) (1- } ut)” “exp (<, > 74%) d6dudt = F( 


i=1 


% 
Ex). 2) 
1 


+= 


In particular, from ¢(y, 0,0) = 0 and (17) we have 
(/2 ayn | ry, 6, u, t) p(@) tirn+n,—2) 
w 


x exp(- 5a 394) (1 > ut)" " dOdudt = mmT(hn)e. (23) 


i=1 


Letting W, be the sample space of @ and u, we get respectively from (23) and 
(22) that 


(./2 yon | is {Hn+n,—2) 
0 
t m _\Hn—2) 
xexp(—s5a)at {| Ty.0,u,) (0)(1- ud) dodu = mis Tm)e, (24 
207 W, “ 


© i(n—2) 
(./2 aymeny [ ti(n+m4—2) exp ( - 5,3) at| Ty, 0, U, t) p(@) (1 a > ut) 
0 . 207 Ws i=1 


x exp (x Erm) dodu = F (5 7) . (25) 


Hence, on developing the left-hand side of (25) into power series in the y’s, we 
must have 


t 
4(n+n,—2+h) 2: i Sam 
\ gtintn—s+ exp( sei) 
#(n—2) / ri h 
«|, Ty, 9, u, t) ) 2(0)(¥— - x 3 ut) ( z= 74s) d@du = 0 for odd h, (26) 
i=1 


re) 
Q—i(n+n,) -mincian | ti(n+2—2)+h 
0 


mM i(n—2) / m 2h 
exp (315) a 4 Ty, 9, u,t) w(0)(1— 5 wf) ( 3 yu) dodu 


n h 
= a( 3) (h=1,2,3,...). (27) 
t=] 


Further, equations (24) and (27) may be written as 


4 t 
ti(n+7—-2) exp | ——— 
| 0 ic: ( 20 i) 


j my i(n—2) ; nin, J (4n) 
r ,9,u,t 6)t1l— ) dOdu— ———*= e |at = 0, 28 
x LI. (_y ) p( i uu u Tin+n,) -€ (28) 
Biometrika xxx11 P 








66 Analysis of variance from the power function standpoint 


ai t F ™ y 4(n—2) 
| tiin+m—D exp - st] [, I'(y, 0, u, t) p() (1 7 Su) 


0 
™% h 
my 2h ay( 3 7) 
i= 
“ (Ey im) aes I'A(n+n,) +h} 
Equations (28), (26) and (29) must hold true for all positive values of c. 
Hence, by the theorem of Laplace transformation,* the functions within the 


square brackets in (28) and (29) and the inner integral in .(26) must vanish 
identically: 





la =0 (h=1,2,3,...). (29) 


‘ hm 4(n—2) = rae! I'(3n) 
9p E845) p(O)(1- ¥ wf) dOdu = Fray ® (30) 





Nm Hn—2) / m h ? 
I'(y, 4, u, t) y(0)(1 -> ut) ( > 7m) d6du = 0 for odd h, (31) 
i=1 i=1 


Ws 
Mm i(n—2)/ m% 2h 
[_,, Te...) 2(0)(1- 3 ua)" Sym) dodu 
W, i=1 i= 
ny h 
a, ( >> 7) 


i=1 


“SIT (h=1,2,3,...). (32) 





From (31) and (32) we infer that 


J a ne A) - Sua) "exp (Er) wciauine a(3 7). (33) 


i=1 


Now for any given values of y and ¢ the integral I I(y, 0, u,t) f(0, u) dO du 
Ws 


equals | /(0,u)d@du, where w, is the set of , points in the sample space of @ and u 


for which I(y, 0, u,t) = 1. Hence (30) and (33) are equivalent to 


| (P01 — Sut) aed m le (34) 
I. p(8) ( a pr a exp (= Vi u) d6du = G (5 7) 7 (35) 


The conditions (34) and (35) are necessary and sufficient that the critical region 
w should have the properties (a) and (6). 
On the other hand, from (12) we have 
BE? = ¥ ui; (36) 
i=1 


hence wy, is the region defined by the inequality 


Sul > Bi. (37) 
i=l 


* Cf. footnote, p. 64. 





Si 
he 


If 


an 


g(e 


Fi 


Th 








P. L. Hsu 67 


Since wy is of size €, we must get the same right-hand side of (34) when in the left- 
hand side we substitute w, for w,. Hence 


{. 7(0)(1- 5 a) aode - i) (PAI Sua) d0du. (38) 


i= 


Let I. pA) (1 ~ Sut) “- (5 Vi u) dédu = G, (= 7) : (39) 


With the help of (37) and (38) we may now appeal to the lemma proved in the 
Appendix and conclude that 


G(A) < GA), (40) 
whence, replacing y; by o—! ty, in the integrals in (35) and (39), 


MH i(n—2) Jt ™ 
7(6)( 1 a ut) exp (< >> vu) dO du 
Ws i=1 i=1 


my a(n--2) Jt bist 
<| 7(0)(1 -> u?) exp (x : 7%) dOdu. (41) 
Wo .  t=1 Oo i=1 
If we multiply both sides of (41) by 


e-A t 
4(n+7,—2) aw oa le 
(Bayram Taree Pex ( act): 


and integrate over the sample space of y and ¢, we get, in accordance with (18), 


BA) < BoA). 





Hence the theorem is proved. 


APPENDIX 


Lemma. Let g(x)>0 be defined for x>0 and vanish for x>1, such that 
g(vi+...+v2) is summable. Let f(w,,...,W,)>0 be summable. In the product 
space of the v’s and w’s let R be a region such that 


[£000 00) O08 +v2)exp(y,0) + ... + Yn) dvudw = G(y?+...+y?). (1) 


Let wy be the region defined by the inequality 
vi+...+u2 2k. (2) 


Let | f (Wy, «+5 Wm) (VF + ... + 02) exp (¥, 0, +... + Yn 0,) dvdw = Go(y?+ ... +2). 
Ro 


(3)* 

Finally, let 
i f(Wy, ---5 Wm) G(V3 +... +02) dudw = fo f(Wy, ---) Wm) G(V3+... +02) dudw. (4) 
Then G(x) < Gy(x) ' (5) 


for all positive x. 


* Notice that (3) is not a separate condition on Ro, but is implied by (2). 





68 Analysis of variance from the power function standpoint 


Proof. In (1) we set y, = 2, Y2 = ... = Y, = 0, and get 
G(x?) = | f(Wy, ---s Wm) g(03 + ... + v2) exp (xv,) dudw. 
R 
This and the conditions on f and g imply that G is continuous. 
Multiplying both sides of (1) by exp (—y?) and integrating over the region 
0<a< Ly <b, we get 


b 
K i) akn-® e-2 G(x) dex 


* | f (Wy, «++ Wm) (27) dodw | exp(—2yi+Zy.v,)dy, (6) 
R a<xly#<b 
where K is some numerical constant. Applying a rotation in the space of the y’s 
to the inner integral in the right-hand side of (6), we obtain 


b 
K { ekn—2) e-2 G(x) dx 


= [flr tq) AZePdodw | exp{—Zet+ (Zep) xy} dz 
R a<tz#<b 
= Y ¢,J,(R), 
h=0 
owiet h 
where I,(R) = wayi | fe .+0y) Wm) g(20?)* dvdw, 


¢Q= { x2" exp (— L2}) dz. 
a<lzP<b 

Similarly, we have 

b rs) 

x{ aiin—2) ee? G(x) dx = > cp 1,( Rp). 
a h=0 
An appeal to.a general lemma of Neyman and Pearson,* on remembering 
(4), leads to the inequality 
I,(R) < 4,(Ro)- 

b 
Hence i) xin—® e-2{ G(x) — G(x)} dx < 0. 

a 


Since a and b are arbitrary and since the integrand is a continuous function, the 
latter must be <0. Hence G(x) < G,(z). 


* Neyinan and Pearson (1936), p. 11. 

















P. L. Hsu 69 


REFERENCES 


Doetscu, G. (1937). Theorie und Anwendung der Laplace-Transformation. Berlin: Julius 
Springer. 

FisHEer, R. A. (1928). “‘The general sampling distribution of the multiple correlation 
coefficient.”’ Proc. Roy. Soc. A, 121, 654-73. 

KozopziEsczyk, St (1935). ‘‘On an important class of statistical hypotheses.” Biometrika, 
27, 161-90. 

NEyMAN, J. & Prarson, E. 8. (1936). “‘Contributions to the theory of testing statistical 
hypotheses.”” Statist. Res. Mem. 1, 1-37. 

Tana, P. C. (1938). ‘“‘The power function of the analysis of variance tests.’’ Statist. Res. 
Mem. 2, 126-49. 











70 


ON AN OPTIMUM PROPERTY OF TWO IMPORTANT 
STATISTICAL TESTS 


By J. B. SIMAIKA, Ph.D. 


P. L. Hsu (1940) has shown that for any linear hypothesis the H?-test is the 
uniformly most powerful of all the tests whose power function depends on a 
certain function, A, of the population parameters. Two other tests of importance, 
namely, those associated with the multiple correlation coefficient and Hotelling’s 
T? (Hotelling, 1931), have the similar property of being uniformly more powerful 
than all other tests whose power functions depend on the respective functions 
of population parameters involved in the distributions of R? and T?. It is the 


purpose of this paper to establish such an optimum property of these two tests. 
We shall consider them separately. 


I. Horetiime’s T? 


The general problem that calls for the T?-test may be stated in the following 
way: given the elementary probability law 


PUY a> +++) Ygr S14 S19» «++» 8qq) 


= K | 8%; |™exp | —4 


q a 
Xi %5(Y¥s— 4) (Yj — 95) — 3 >> ays} (1) 
i,j=1 i,j=1 

(5; = jj, 85 = 854), 


it is required to test the hypothesis that 
%=0 (é=1.,...,q). (2) 
Hotelling’s test consists in calculating 
Be 
r= 2X MYYp (3) 
t 


where s denotes the general element in the matrix || s;;||-', and rejecting the 
hypothesis if T?> 72, (4) 


where J? is a constant so determined that the risk of rejecting the hypothesis 
when it is true equals e. 

The distribution of 7? derived from (1), which conforms with Fisher’s (C) 
distribution (Fisher, 1928), was obtained independently by Hsu (1938) and Bose 
& Roy (1938), and may be written 


p(T? | 9, a) = p(T? | y*) 





= ev > e (T2)ta+h—1(] 4 J'2)-Hom+a)—h (5) 
n=o Ah! Bihq+h, dm) . 


q 
where = BD mss (6) 
wu = 


3 
> 


B) 


a 


J. B. SIMAIKA 71 


Hence the power function of the T?-test is 
[- ecrivaacrs, (7) 


which depends only on the function y? of the 7’s and a’s. Our first theorem asserts 
that the 7?-test is uniformly more powerful than any other test whose power 
function is a function of y? alone. 


THEOREM I. Let wy (of size €) be the critical region defined by the inequality (4), 
and w be any other critical region whose size is ¢ and whose power function is a 
function of y*. Let B(y?) and By(y*) be the power functions of w and w, respectively. 


aoa BW) < al). (8) 


Proof. Let us first find a necessary and sufficient condition that w should 
have the properties described in Theorem I. We have, from (1), 


a q a 
ply, 8) = Ke | s,; |" exp —3 Ois(Si5 + YZYj) + | E aynys. (9) 
1.7= 


i,j=1 
Hence, on setting Uy = Sit yy; (1.9 =1,...,9) (10) 
and i= ai;9; (t=1,...,q), (11) 
we have p(y,u) = Ke | u,;—y;,y; | exp| a z. 5 Uz + : tus) (12) 
re te 
and Wad D aitlaly, (13) 
i,j= 


where a‘ denotes the general element of the matrix || «,; ||. 
If w is of size € and has a power function depending only on y, then 


qd 
k| |i; — YY; |? exp(-4 = agtts}dydu =€ (14) 
w if 
and 

qg q 
K{ | Ug — Yi; [#2 exp ( —-$ 2 5 Wis +2 é.y:)dydu = e B(y*) = F(y), say. 

w i,j=1 i=1 
(15) 


It follows from (15) that, on expanding the left-hand side into a power series in 
the €’s, we must have 


qa / @ h 
| | Wis — Yd; |#"—-1 exp ( -4> Mis us) ( > tui) dydu = 0 for odd h, (16) 
w i,j=1 i=] 


Kk r [uy — iy exp (—4 ~ aytts) (3 Ee) dy du 
(2h)!Jwo 8"? jar N= 


BS any (h = 1,2,3,...), (17) 


where the a, are numbers depending only on the region w chosen. 














72 On an optimum property of two important statistical tests 


On the other hand, since the integral of (12) over the sample space W is unity, 
we have 


q @ 
Ky Wis — Ycy; | exp ( —4 2 mytgt tu) dydu =e", = (18) 
q 
whence K| | wes — YY; |** exp -+> ast) dydu = 1, (19) 
Ww i,j=1 


y ©. | tm-1 i< ) 5 a d 
(2h)! op | tt E45 | exp ee) = Sim ydu 


és aw (h = 1,2,3,...). (20) 
Combining equations (14) and (19), (17) and (20), we obtain 
I. was — yey, |*"-? exp ( i >a ast) dydu 
= e| | i; — Yc; |*™—* exp ( = > aust) dydu, (21) 


| ig — iy; |? exp(-3 
w i, 


(h = 1,2,3,...). - (22) 


The sample space W is the product space W(u) x W(y|u), where W(w) is the 
sample space of the u’s and W(y|«) is formed of the possible positions of the 
point (y,,...,¥,) for given values of the w’s. Similarly w= W(u) x w(y|u). If we 
evaluate the integrals in (21), (16) and (22) as repeated integrals, we obtain 


q 
| exp(-4 >> ast) 
W(u) i,j=1 
{| ay — vary [I> dy —e | Jy — ys |im—edy | du = 0, (23) 
w(y|u) J Wiylu) 


qd q h 
} exp ( -+> ast) | | | Wig — Yes lam | py tu) dy |au = 0 for odd h, 
W(u) i=1 w(y|u) i=1 (24) 


q q 2h 
| exp | —4 2 ays) [ [ | Wis — Yas (> tu) dy — ay, 
W(u) i,j=1 J w(ylu) =1 


v 
q 2h 
«| ee et in > tus) dy |au =0 (h=1,2,3,...). (25) 
W (yl\u) i=1 ] 


Since equations (23), (24) and (25) must hold true for all admissible sets of values 








J. B. SmwarkKa 73 


of the a,;, so, according to the lemma proved in the Appendix, the functions 
within the square brackets in these equations must vanish identically. Hence 


| | wes — Yay; [PI dy = e| | Wes — Yiy; |*™* dy, (26) 
w(y|u) Wy\u) 
@ h 
fol moses lim ( 3 Saye) dy = 0 for odd h, (27) 
wly|u) i=1 
@ 2h q 2h 
{ | wi;— yey; |? (3 cus) dy = a, | . | wis — Yay; | (3 cu) dy 
w(y/u) i=1 Wy/u) i=1 


(h = 1,2,3,...). (28) 

In order to simplify the above equations we notice that the matrix || ~,; ||, 

being positive definite, can be thrown into the form CC’, where C is a non- 
singular real matrix. Using the transformation 


Yas <--> Yq il = lay, ---» Ay l]C’, (29) 
we get that 
q im—1 a im-1 

{ (1- 5 ai) dx =e (1-5 a3) dx, (30) 

w(z|u) i=l W (z\u) i=1 

, q tm-1/ @ h : 
| (1 -> 2) ( > Ex) dx = 0 for odd h, (31) 
w(a|u) i=1 i=1 


f qd tm-1'{ @ \ 2h 
[mo (1- 22) (26a) ee 
J w(ziu) \ i=1 i=1 
qa $¢m-1 a 2h 
ws a, | (1- S 2) ( d £21) dx (h=1,2,3,...), (32) 
W(2\u) i=1 i=] 


where Il Ea, «++ Sq ll = Ea» ---» SiG. (33) 
Now W(z| u) is the region S (independent of the u’s) defined by the inequality 
qa 
Sak<l. (34) 
i=1 
Hence the integral in the right-hand side of (30) is a numerical constant, say 6, 


and a rotation in the space of the z’s enables the integral in the right-hand side 
of (32) to be written as 


qa h qa ¢m--1 
(Sa) [a (1-3 at)" ae. (35) 
i-1 / Js i=1 | 
Hence we obtain the following equivalents of equations (30), (31) and (32): 
C qa tm--1 
(2 -> zi) dx = be, (36) 
J w(zlu) i=1 
a jm—-1/ @ h 
J w(z|u) i=1 i=1 


a 4m—1 q@ 2h a h 
| (i-Saa)" (Sea) d= ( Sa) @= 12.3...) (88) 
~ w(zju) \ i=1 i=1 : 


i=1 
where the b, are numbers depending only on the choice of the region w. 











74 On an optimum property of two imporiant statistical tests 


The set of equations (36), (37) and (38) give the necessary and sufficient 
condition that the critical region w should have the properties described in 


Theorem I. Further, equations (37) and (38) may be combined into the 
following: 


qd 4m—1 qd q 
Icon (1-228) exp( 3 ers) dx = (32). (39) 
w(z|u) i=1 i=1 i=1 
Now according to (29) we have 
q <a 
LVaj= LX uly,y; = T?/(1+T), (40) 
i=1 i,j=1 


where w/ denotes the general element of the matrix || w;; ||. Hence w, is the region 
defined by the inequality 


S a> T2(14+ 72). (41) 


Since wy is of size €, we must have the same equation as (36) when w is replaced 
by w, therein. Hence 


qa ¢m—1 qa \ém-1 
| (1- > xt) dx -| (1- > «) dx. (42) 
w(z|u) i=1 We i=1 
c 7] \4m-1- qd qd 
Letting | (1 ~ ¥ 2] exp( >» Ex) dz = uo( } ei), (43) 
We i=1 i=1 i=1 


we deduce with the help of (41), (42) and the lemma proved by P. L. Hsu in the 
Appendix of his paper (1940) that 


(3 et) an (> ei) (*4) 


Applying the transformation reciprocal to (29) to the integrals in (39) and (43), 
we get 


fa 
| | Weg — Yay; |*™-? exp( >> tu) dy <| 
w(y|u) i=1 


w(ylu) 


q 
| ws — yey; |" exp (= tu) dy. 
(45) 
qa 
Hence on multiplying both sides of (45) by K exp (— y*) exp ( —-4> a; us) and 
ij=1 


integrating over W(u) and remembering (15), we have the inequality (8): 


B(y*) < Bol"), 
which was to be proved. 





vw 


; 





J. B. Smwarka 75 


II. MuttieLe CoRRELATION COEFFICIENT 
In this connection the basic elementary probability law is taken to be 








P(2, Ya» --+sYqo Tra» Zig, --+)%qq) = K(1—p*)*| 2 yy... Yo‘ (Mm-#-8) 
| 
"% Ty Vg 
Me ws RY 
q @ 
xexp( dre EA 1 S aueu) (1) 
= (= 
(a; oe X55 Vij = Xj), 
ae 
where pPP=- ¥ avZ-B; (2) 
Vij=1 
is the square of the multiple correlation coefficient of the population. We have 
z es ae 
| nea ve = | 2, | (2 2 x vs) (3) 
| wm Xig nee 
eee recccccseceeseceesese 
| Ye Xa Xaq 





R? = - p> BYU (4) 
The hypothesis to be tested is that 
B,=0 (¢=1,...,q). (5) 


THEOREM II. The basic elementary probability law and the hypothesis under test 
being gwen by (1) and (5), let wo be the critical region of size ¢ defined by the inequality 


R?> R? (6) 
and w be any other critical region whose size is € and whose power function depends 
only on p*. Let B(p*) and f,(p*) be the power functions of w and wy respectively. 
Th 

" Blp) < Bo(p*). (7) 
Proof. Suppose that w has the properties described in Theorem IT. Then 


' q ae 4(n—q—2) q 
K | | arg; | -9(2— >> x y.y,) exp | —tyz-4 > a) dzdydz = ¢, 
Jw } 1 \ i.j=1 
(8) 


i,j= 


5 P ee ee (n—g—2) 
K| jaye (2- x7 YY; 
w j= 


j=1 


@ q 
xexp(—iy2—} Le by Xeyz— 3 Ay) dzdydx 
ij=1 i=1 


j= 


= (1—p*)-#* B(p*) = F(p*), say. (9) 











76 On an optimum property of two important statistical tests 


Hence, on developing the left-hand side of (9) into a power series in the /’s, 
we have 


® qd ‘ (n—q—2) 
[zy |He-e-9(2— x xe yy. 
w i,j=1 


h 
x exp( 472-4 y aust) ( > Bis) dzdydx = 0 for odd h, (10) 
_K i(n—q—2) . ij iia-c-8) 
iyi) yal "(2-2 270) 
q i @ 2h 
x exp ( -—hhyz- + y ayy) (4 us) dzdydx 


_ I(gn+h) ie 
where the a, are numbers depending only on the choice of the region w. 
On the other hand, since the integral of (1) over the whole sample space W 
is unity, we have 


a 
K{ |zylie-e-9(2— x al yy; 
Ww i,j=1 


(h = 1,2,3,...), (11) 


‘ee 


q q 
x exp ( —hyz- + “ a Shu) dzdydx = (1—p*)-, (12) 
oy = = 
whence . 


‘ . to 4(n—q—2) 
x{ |zy|ie-2-9(2— = gt v.vs) 
Ww i,j=1 


qa 
exp ( a syz ra t E ayy] dzdydx = iF 


(13) 
_K_ (n—q—-2) . ij ew 
(2h)! iy at em % Yi¥; 


+ 


t 


qa 2h 
ast) (3 An) dzdydx 


rk h 
- are (h = 1,2,3,...). (14) 


@ 
coxo(—bre-4,$ 


1 


Combining equations (8) and (13), (11) and (14), we obtain 

g i(n—g—2) 
| [zy [ioe (2— b>} yy) 

w 1,j=1 
q 
x exp( —hyz-4 > ayy) dzdydx = e| (...)dzdydz, (15 
i,j=1 Ww 

qa i(n—gq—2) 

{ | 24, [n-e-0 (2 *. ye viv) 


qa qd 2h 
x exp ( —tyz-—3 a ay) (> Aw) dzdydx 


i 1 


= ay [| (..)dedyde (h = 1,2,8,...), (16) 


























J. B. Smmarka 77 


where the unwritten integrands in the right-hand sides are the same as those 
in the left-hand sides. 


As before we argue that W = W(z,x) x W(y|z,x), w = W(z,x) x w(y|z,x) and 
evaluate the integrals in (15), (10) and (16) as repeated integrals. It follows that 


i | x; |«"—-2-2) exp —4hyz-3 > AssLeg 2— s xtiy.y ni 
Wee.2) aj 2 2. ij ij item is Yj 
YZ, z, 


i,j=1 i,j=1 


q = 
-e{ (-- Xe" yy; 
Wyle, x) i,j=1 


qd 
x,;|*#"-2-2 exp | — dyz—-— ag) 
ee | i ; ( ty 2, sia 


i(n—q—2) 
) dy |deae =0, (117) 


q a i(n—q-2) / @ h 
| (2 > yas) ( > Bus) dy \dzdx = 0 for odd h, (18) 
w(y|z, x) i,3 i=1 


i,j=1 


q 
| | ayy [KD exp ( —tyz-4t & args) 
We, 2) j=l 


i,j= 


/ Ey: \m—-a-2) / ¢ 2h 
il (2- ? > x y.y;) (= us) dy 
-/ w(ylz, x) i,j=1 / i=1 


er 


hd 


: ‘ 
=a | (2- DL 2X yy; 
W(ylz, 2) 


i,j=1 / 


(= pias) |dydede =0 (h=1,2,3,...). 
i=1 sss 


According to the lemma proved in the Appendix, the functions within the 
square brackets in the above equations must vanish identically. Hence 


( qa tok Kn—q—2) 
a) a 2 7 (20) 
w(y|z, 2) i,j=1 W(ylz, x) 
{ a = (n—q-2) / @ h 
| (2- > ai vis) ( > bus) dy = Oforoddh, (21) 
w(y|z,z) \ i,j=1 i=1 


qd ou i(n—g-2) / @ 2h 
{ (2- mh x vis) (3 By) dy 
w(y|z, x) i,j=1 i=1 


=a, | (...)dy (h=1,2,3,...). (22) 
w(y|z, 2) 


In order to simplify the above equations we notice that, since the matrix 
|| z,, || is positive definite, it can be thrown into the form CC’, where C is a non- 
singular real matrix. Using the transformation 


ll Yas «++» Yq ll = 2 lta, ---,@g IC’, (23) 











78 On an optimum property of two important statistical tests 


qa 4(n—g—2) 

we obtain { (1 2— a) dt = e| (...) dt, (24) 

w(t\z, x) i=1 Welz, z) 

qa (n—q—-2) / @ h 
| (1 -> a) ( ) rity) dt = 0 for odd h, (25) 
wi(tlz, x) i=1 i=1 J 
q in—g—-2) / @ 2h 
(1- 54) (3744 at = a, (...)dt (h=1,2,3,...), (26) 
w(tlz, 2) i=1 i=1 Wille, x) 
where ly, «++» Tq ll = 241A, ..., Bg IC. 


Now W(t|z,x) is the region S (independent of z and the z’s) defined by the 
inequality 


2 H<1. (27) 
Hence the integral on the right-hand side of (24) is a numerical constant, say b, 


and a rotation in the space of the ¢’s enables the integral on the right-hand side 
of (26) to be written as 


qa h @ i(n—q—2) 
( > ) | (1 -> a) #2” dt. (28) 
i=1 Ss i=1 
Hence we have the following equivalents of (24), (25) and (26): 

q 4(n—q—2) 

} (1 ->d a) dt = be, (29) 
w(t[z, x) t=1 

qa i(n—q—2) / @ h 

| (1 ->d a) ( yi rt) dt = 0 for odd h, (30) 
wi(t|z, x) i=1 i=1 


qa i(n—g—-2) / @ 2h a h 
| (1- 3) (3 7.4) at = b,( 3 71) (h = bay accel tee) 
ut|z, 2) i=1 i=1 i=1 


where the b, are numbers depending only on the choice of w. 
Equations (30) and (31) may be combined into the following one: 


q i(n—q—-2) q q 
| (1- > #) exp(- 3 rt,) at = a( > 1). (32) 
wit\z, 2) i=1 i=1 i=1 


Now, by (4) and (23), we have 
q 
R= > &; (33) 
i=1 


consequently the region wg is defined by the inequality 


@ 
> &> Re. (34) 
i=1 














J. B. SIMAIKA 79 


Since w, is of size €, we must have the same equation (29) when w(é|z,x) is 
replaced by w, therein. Hence 


@ \ «m—q—2) @ 4(n—g—2) 
| (1- > 4) a= | (1- > 4) dt. (35) 
w(t|z, x) i=1 Wo t=1 
q \kKn—g-2) q q 
On setting | (1 — x a) exp ( _ = T; t) dt = @( & ri), (36) 
We i= i= i= 


we infer, with the help of (34), (35) and the lemma proved by P. L. Hsu in the 
Appendix of his paper (1940), that 


G (3 a) <G, (3 a). (37) 


Hence, using the transformation reciprocal to (23) to the integrals in (32) and 
(36), we have 


{ q : i(n—gq—2) : qd - 
| (2— - >» oyu) exp( ~ E Aus)av< | (...)dy. (38) 
w(ylz, x) i,j=1 \ ¢=1 


wolylz, 2) 
Multiplying both sides of (38) by 
q 
K(1—p*)® | zy |e exp( —tyz-—} D 252) 


\ i,j=1 


and integrating over the space W(z,x) and remembering (9), we obtain 


B(p?) < Bo(p?). 


Therefore Theorem II is proved. 


I am gratefully indebted to Dr P. L. Hsu for putting this problem before me 


and for his helpful suggestions both in the course of my research and in preparing 
this paper for publication. 


APPENDIX 


Lemma. Let E(x) be the set of points (243, Xj, ...,%_qq) for which the symmetric 
matrix || x,;|| 1s positive definite. Then 


[ pea! 20#@) lexp(— 3 2y)de<c0 (6,5 = 1s na) (1) 
E(x) i=1 
and 


r 


| (x) exp ( _ > ay) dx = 0 throughout E(a) (a g5 = jg, yy = Oz) (2) 
E(z) i=1 . 


imply that (x) = 0 almost everywhere in E(zx). (3) 











80 On an optimum property of two important statistical tests 


Proof. Suppose that both (1) and (2) are true. Since the matrix || 4,;+4,; ||, 


where 6;; = 1 and 6,; = 0 (¢+j), 0,; = 9;;,, is positive definite for all sufficiently 
small real @’s; so, by (2), 


[82 ( = x x.) exp ( 7 a 0424) dx = 0 (4) 


for all sufficiently small real 6’s. By (1) the left-hand side of (4) is an analytic 
function of each of the 6’s in the neighbourhood of the imaginary axis. By 
analytic continuation (4) must remain true for all complex 6’s with sufficiently 
small real parts. In particular, 


Voo™ = ( mp 2 v4) se (v= Z, tyty) dx=0 (tj = ty) (5) 


for all real values of t’s. Hence; by the well-known property of the Fourier 
transform, 


@ 
o(e)exp | - x4) = 0 almost everywhere in H(z), 
i=1 
which implies (3). 


REFERENCES 


Bosz, R. C. & Roy, 8S. N.\(1938). ‘‘The distribution of the ‘Studentized’ D®-statistic.” 
Proceedings of the Indian Statistical Conference, Calcutta, pp. 19-38. 

Fisuer, R. A. (1928). ‘‘The general sampling distribution of the multiple correlation 
coefficient.”” Proc. Roy. Soc. A, 121, 654-73. 

Horetuine, H. (1931). “The generalization of Student’s ratio.” Ann. Math. Statist. 2, 
359-78. 

Hsv, P. L. (1938). ‘“‘Notes on Hotelling’s generalized T.”” Ann. Math. Statist. 9, 231-43. 


—- (1940). ‘‘ Analysis of Variance from the power function standpoint.’’ Biometrika, 32, 
62-9. 


























81 


MISCELLANEA 


(i) A recurrence relation for the semi-invariants of 
Pearson curves 


By M. G. KENDALL 


The Pearson curves are defined by the differential equation 


y(a+2x) dx 


dy = — ‘ 
7 bo +b, 2+6,2? 





Multiplying by e**(by) + 6,2 + b,x?) and integrating over the range of the distribution, we have 


ferwa +2)dx = (( +6,x + 6,2") e'* dy 


= [(b9 + 6, x + by2*) ey] — [ vacet {b, + 26,2 + t(b,+6,x%+6,2x")}. 


At the extremes of the distribution we may suppose the expression in square brackets on 
the right to vanish and hence 


e : 
e y{a +b, + bot + (1+ 26,40, t) 2+ 5,tx*} dx = 0. 


ce (1) 
The moment ind function of the distribution, M(t), is given by 
M(t) = (et-yae, 
and hence = = . ‘txydax, etc. Thus from (1) 
byt = +(1+2b,+ b+ (a+b, +bot)M=0, nae (2) 


a linear differential equation of the second order, which may also be regarded as defining the 
Pearsonian system. 

Incidentally, it would be interesting from the theoretical view-point to consider classes 
of frequency distributions defined by differential equations in their moment or semi-invariant 
generating functions. 

So far as I know there is no solution of (2) in ordinary functions which would permit of 


the explicit expression of the co-efficient of ¢” in M(é); but from a consideration of the co- 
efficient of #” in (2) we have 


{L+(r+2) bg} mii, +lat(r+ lds mpt+rbomey = 90,  — svnnee (3) 


the well-known recurrence relation between the moments of Pearson curves. 
Some simplification of this expression is possible by the choice of s particular origin in 
certain cases. If the roots of 
bo +6, 2+ 6,2? = 0 
are real, it is possible by a real linear transformation to transform the equation defining the 
Pearson curves to one which does not involve b,. With the origin defined by this transforma- 


ti > ‘ , f 
ion, we have {1+ (r+2) by} 4’, +{a+(r+1)6,} 4; = 0, 


ee , _(a+rb,)(a+r—1b,)...(a+5,) 
giving Bh, = (-1)' ee DAR (4) 
(1+r+1b,)(1+47b,) ... (1+ 26.) 


Biometrika xxx 6 











82 Miscellanea 


Putting K = log M in (2), we have for the semi-invariant generating function 


2 f r\32 
(+ (FI | 4 (1+ 2b, +004 (a+b, +b) = 0. seeeen (5) 
This is not linear, and it. appears therefore that there is no simple recurrence relation among 
the semi-invariants as among the moments. The equation is similar in character to that 
known as Riccati's and the usual way of solving it would be to return to the linear equation 
(2) from which it was derived. 

Taking an origin at the mean (x, = 0) and considering the co-efficient of ¢ in (5), we have 


beKr41 |Xq Ky_y Ky Kr_s Kroa 


Kq\ Keat Ky 
— ° ~+... — 1+26,) —— +b, -——_ = 0, 
(r—1)!  *\1N(r—2)! 2!(r—3)! + aay ay?! a * *(r—1)! 
tet 

or (L+ (7+ 2) ba) Koya + Pb +764 | y | Kak 

r—1) ‘r—1 v=. \ 

og ee )eakeat + (" Jsarkraat oo +( 1 ) 1X4} = 0, coves (6) 

with the initial relation Ky = —bo/(1+30,). 


Equation (6) seems to be as simple a recurrence relation as we can expect for the expression 
of a semi-invariant in terms of those of lower order. 


(ii) A comparison of annual and biennial inflorescences of 
Daucus carota (wild carrot) 


By WILLIAM DOWELL BATEN 
Michigan State College 


INTRODUCTION 


In 1932 seeds from Michigan and Indiana were gathered from Daucus carota for the purpose 
of studying environmental effects on the numbers of pedicels and bracts per inflorescence 
from plants grown from Michigan and Indiana seeds. In 1933 these seeds were planted in the 
botanical gardens of the University of Michigan in the green house and later planted outside. 
In 1933, 44 % of the plants bloomed; in 1934, 17 % of those that did not bloom the first seasoa 
survived the winter and bloomed. Results of this study were published by the present writer 
(1934) in an article entitled ‘‘A statistical study of Daucus carota”’, in which the numbers 
of pedicels and bracts on annual and biennial inflorescences coming from these seeds were 
compared. At the end of the article K. Pearson pointed out that since the seeds were taken 
from many plants in the wild, some of the seeds might have come from flowers blooming the 
first season and others from those blooming the second season, that one did not know how 
many annual and biennial seeds came from the two states and that the comparisons might 
not be the same if this was considered. 

To overcome this just criticism seeds were taken from one plant near Ann Arbor, 
Michigan in 1936. These were planted in 1937 in the greenhouse at Michigan State College and 
later planted outside in rows 3 ft. apart and 3 ft. apart in the rows. During the latter part of 
the summer, counts were made on plants blooming the first year of the number of branches, 
the number of inflorescences, and the number of primary pedicels and bract per inflorescence 
on the stem and first eight branches below the stem terminal cluster. During the summer 














Miscellanea 83 


of 1938 similar counts were made on the plants blooming the second year. The object of 
this article is {ov compare the counts pertaining to the annual and biennial inflorescences. 

During the first flowering season 60% of the plants bloomed. At the end of this season 
the plants which did not bloom appeared to be in good condition for the coming winter. 
In 1938 biennial flowers appeared much earlier than the annual flowers in 1937. Counts of 
the annuals were made in 1937 in August and September; counts of the biennials were made 
in July and the first part of August. 

The terminal inflorescences on the stem will be designated by 7’, the first branch terminal 
inflorescence by A, the first non-terminal inflorescence on the first branch by A,,ete. Branches 
are considered in descending order below the stem terminal. According to these notations, 
D, represents the third non-terminal inflorescence on the fourth branch. Umbels in this 
article will always mean primary umbels and pedicels or rays will always mean primary 
pedicels or rays. 


SIZE OF ANNUAL AND BIENNIAL PLANTS 


The following averages pertain to the number of branches and inflorescences (including 
buds) of annual and biennial plants. ; 








Annuals 
(1937) 


Biennials 


Parts ( 1 938) 


: 
Average no. of branches 6-3 20-3 
. ! 
Average no. of inflorescences | 7 





These averages indicate that the biennial plants were much larger as to number of 
branches and inflorescences than the annual plants. The second year herbs were considerably 
taller than those blooming the first season. 

In 1938 most of the branches used in making the counts had four umbels, whose parts 
could be enumerated; in 1937 very few of these had four umbels which were mature enough to 
use. Very few of the first branches belonging to 1937 plants produced more than two non- 
terminal umbels; a good percentage of corresponding branches of 1938 plants possessed more 
than two. Counts were made on seventy-seven plants during the first season and on seventy- 
six during the second. In the second summer there were several plants with more than 500 
inflorescences and one with 796; the largest in 1937 had 183. 

In 1937 there was 37-7 % of the herbs with at least eight branches; in 1938 there was 
72-4 % with at least eight similar branches. In the first flowering season 46-8 % of the plants 
had at least six branches; during the second season 90-8 % had at least six branches. There 
were 74-0 % of the annual plants with at least four branches and 97-4 % of the biennials with 


at least four. These figures show that the biennial plants were more completely filled out than 
the annuals. 


SIZE OF INFLORESCENCES 


Table 1 contains averages pertaining to the number of bracts per umbel on the stem and 
the first three branches. On the average the number of bracts on the stem and branch terminal 
clusters of biennials are significantly larger than similar annual clusters. The average size 
(in number of bracts) of stem umbels for annuals was 10-9 bracts; that for biennials was 11-9 
bracts. The average number of bracts per branch terminal was less than 10-3 bracts during 
1937 and greater than 11-4 bracts during 1938. These figures and figures pertaining to the 
first eight branches indicate that the averages of the number of bracts on the majority of 
the biennial clusters are significantly larger than similar averages with respect to annual 
clusters. 














84 


Miscellanea 







































































Table 1. Averages and standard deviations pertaining to the number of bracts 
per umbel on the stem and first three branches 
1937 
= ) on ' 
Pbud bay bade betty POP ee 48 io bees GO 
Number [77 |77 |55 |18 | — |76 |59 |49 | — |68 |52 |39 _ 
Average |10:9 | 99 | 94 | 86 | — |104 | 95 | 95 | — |103 | 94 | O4 | — 
Standard 1-22] 1-39) 1-22] 1-01) — 1-65; 1:33) 1-30) — 1-42] 1-21] 1:27) — 
deviation | 
| | 
1938 
eae ta | los ax 
Number [76 |76 |38 [27 | 7~|75 |30 j29 [15 |71 [30 |25 | 21 
Average | 11-9 11-4 9-8 | 10-3 | 11-0 | 11-6 | 10-1 | 10-7 | 11-0 | 11-9 | 10-2 | 10-9 | 11-2 
Standard | 1-35) 1-20] 1-43) 1-63] 1-77/ 1-20] 1-41] 1-44) 1-15) 1-10| 1-32) 1-45| 1-23 
deviation | | | | | | | 
& l i | 





Biennial stem terminals had on the average significantly more pedicels than stem 
annuals; these averages are: 


Annuals Biennials 
55-6 pedicels _ 67-8 pedicels 


Branch terminals of biennials have on the average significantly more rays than similar ones 
on annuals. Fig. 1 allows the eye to see at once how these averages compare; the heights of 


the bars on the left represent the averages for the annuals. The bars on the left are shorter 
in every case. 


Number of pedicels 








































































































60 ; 
ao Ao ng 
Y ’/ “yj / Y 7 
50- } Y Y AV, 7) / V1, f 
/ , Vj y Jj 
AVA At AYA UA AA AV 
40- (, 4, / AY, / / 
AVA, Y Yay LAV y LAV, AV, 
/ / 4 
AV, Vy /, J LAVA, 4 AY LA V/ 1, V/ 
1 YY AVUY UY J AVY YA AVY 
Y, 4, of SAV /, 4A V/ 
0 bj /, 4 /, SAV, 
20-4 4 4, V/ “ ; LAV 
” / AVA “AY AVY YAY UY 
YU, LAV, Af , LAV, LJ, VY, /, 
io4 f Y f f // 1,7 (A V/ VY) 
4 Yj AY, y AY, y /) G 
Y // AY, 
0 Z YA Y LZ 4 LA ANZ A 
a b a b a b a b a b a b © a b 
1 B Cc D E F G H 


Fig. 1. Averages pertaining to number of pedicels per branch terminal 


umbel for annuals and biennials. a, annuals; 6, biennials. 


Many of the averages of the numbers of pedicels on non-terminal biennial clusters are 
significantly larger than those belonging to corresponding annual clusters. The above indi- 
cates that biennial inflorescences (in number of bracts and pedicels) are significantly larger 











Miscellanea 85 


than corresponding annual inflorescences, showing again that Daucus carota herbs blooming 
the second season are on the average much larger than those blooming the first. 

On examining average number of pedicels per umbel it is found that branch non-terminal 
umbels have on the average a smaller number of pedicels than the corresponding branch 
terminals; for example, A, and A, are significantly less (in number of rays) than A. This was 
true for the other branches. The 1938 averages for C, C,, C, and C; are as follows: 


Cc C; C; Cs; 
61-7 rays 45-3 rays 49-3 rays 51-1 rays 
Similar figures were found for the other branches. The averages pertaining to pedicels are 


shown in Table 2. 


Table 2. Averages and standard deviations pertaining to the number of pedicels 
per umbel on the stem and first three branches 





a 












































1937 
Tr 2£TA TA AT eR TR IL REET a | a 1¢G4 
| | | Pe: Sees [Ee } 
| | | im | 
Number 77 77 55 18 | — |76 59 Te |e | 68 | §2 3 — 
Average 55-6 | 53-2 | 44-5 | 40-2 | 54:3 (46-0 (46-5 | — | 55-6 | 46-8 oe | — 
[Standard | 12-65| 10-98, 7-40| 6-96 | 11-32, 8-81) 937) —. | 12-20) 8-70) 8-18) — 
deviation | | | | 
l eee. | | | ee | | 1 | i | 
1938 
ee RE PRE ge Oe es ae let 
| Number 76 =| 76 33. | 27 7 =| 75 30 29 15 71 30 25 a 
Average | 64-26 | 57-01 | 45-06 | 48-07 | 48-43 | 60-77 | 47-23 | 49-45 | 47-13 | 61-69 | 45-33 | 49-32 | 51-14| 
Standard | 13:10) 9-78) 9-98 5-27) 8-79/10-58| 9: 93! 10-31 | 10-60 | 11-12! 11- “62 | 10-07! 9-07] 
deviation | a 
a ee SS |S | i i i | 





CoRRELATION 


The Pearson linear correlation coefficient between the number of bracts and the number 
of rays for various umbels for annual and biennial umbels are as follows: 














sete ap eS 
Umbel ... | a ae. es eo 
| | = eevee 

Annuals | 0-406 | 0-410 | 0496 | 0-571 
Biennials | 0-655 0612 | 0590 | 0549 | 
er Rs SE | | _ 





All of these coefficients are significantly different from zero, showing that there is a definite 
relation between the number of bracts and the number of rays. There are no significant dif- 
ferences between the correlation coefficients pertaining to annuals and biennials except that 
for T which is barely significant at the 5 % level. These values suggest that the size of the 
plant and season do not effect the relation between the number of bracts and rays per umbel. 
Similar figures were found in other investigations of this species (Baten, 1934). The position 
of the umbels on the plant also does not affect the relation between bracts and rays. 











86 Miscellanea 


The coefficients of correlation between the number of bracts on T and on the other 
clusters pertaining to annuals and biennials are about the same and are significant, indicating 
a real association between bracts on stem and branch terminals and similarly for rays. The 
values of rpg are: 





Bracts Rays 


Annuals |. Biennials 








| 
Annuals |. Biennials | 
| 


| 

me oes | | 

| Trp =| 0-644 | 0-596 | 0-849 | 0-741 
py 





Size of herb and season have no effect on the relation between bracts and rays on 7 and on 
A, Band C. 

The amounts of dependence of the number of bracts on branch non-terminals have on 
the number of bracts of terminals for the first three branches were obtained by the correlation 
coefficients between these respective numbers. There are no significant differences between 
these coefficients, suggesting that the size of plants and seasons do not affect the relation 
between the number of bracts and rays on branch terminals and branch first non-terminals. 
This also was true for rays. 

The following figures are the coefficients of correlation between the number of bracts and 
rays on first and second branch terminal inflorescences. 








Bracts Rays | 
Description |———__————___—__—_ ——}. $+ 
Annuals | Biennials | Annuals | Biennials 
es SES aie hciendon * © ince 
4p (interclass) 0-748 0-734 0-916 0-856 
4p (intraclass) 0-701 0-710 0-896 0-786 
ee —— | SS ee 


These values indicate no significant differences between the correlations pertaining to annual 
and biennial inflorescences. They do suggest a rather high correlation between the number of 
bracts and rays on first and second branch primary umbels. 

The relation between floral parts on B and T' and A is manifested by the following 
multiple and partial correlation coefficients. 





| | | 
| | 





| Bracts | Rays 
Description ee x eae antes) Te — - ———- 
Annuals | Biennials | Annuals Biennials 
t es FR | 
| tear 0-807 0-796 | 0-683 0-871 
rpar | 06386 | (0-656 | 0-662 0-680 





Again there are no significant differences between these coefficients indicating that the 
arnount of relationship remains the same between these floral parts pertaining to annual and 
biennial inflorescences. 


of 


1s 


nd 


Miscellanea 87 


SUGGESTIONS FOR FURTHER STUDY 


It might be argued that biennial plants should naturally be larger in every way since 
these plants had a longer time in which to establish themselves than the annuals; that the 
root system of the second season plants are much better for supporting the plants than those 
of the first season. This may be true. To overcome this criticism and to make more reliable 
comparisons between annual and biennial plants and inflorescences it might prove of real 
value to secure seeds from one plant as done in this study, save seeds from the annual and 
biennial flowers, and plant these seeds under the same environmental conditions and then 
make comparisons between the counts made in this study. Seeds should be planted in the 
fall and in the spring. Investigations along these lines may produce more interesting results 


SUMMARY 


This study has shown that: 

1. The average number of branches on biennial inflorescences of Daucus carota is larger 
than the average number on annual inflorescences. 

2. The average number of inflorescences on biennials is larger than on annuals. 

3. The average number of bracts per biennial clusters is larger than the average on annual 
clusters. 

4. The average number of primary rays on biennial umbels is larger than that on annual 
umbels. 

5. The correlation coefficient between bracts and rays is about 0-60 for annual and 
biennial clusters. 

6. The size of plants and seasons (first and second) do not affect the amount of correlation 
between floral parts on stem terminals and branch terminals. 

7. The amount of correlation between certain floral parts on branch terminals and non- 
terminals is about the same for annuals as biennials. 

8. The amount of correlation between bracts and rays pertaining to first and second 
branch primary umbels is about the same for annuals as biennials. 


REFERENCES 


Baten, W. D. (1933). “‘A statistical study of Daucus carota L.”’ Biometrika, 25, 186-95. 
—— (1934). “A statistical study of Daucus carota L. II.” Biometrika, 26, 443-68. 








BOOKS RECEIVED 


Modern Machine Calculation with the Facit Machine Model Lx. By H. SABIELNY; 
translated and revised by L. J. Comrie aml H. O. Hartiey. London: 
Scientific Computing Service, Ltd., 23 Bedford Square, W.C. 1. 1939. 
Price 5s. 

Statistical Method from the viewpoint of Quality Control. By WatTER A. SHEW- 


HART; edited by W. Epwarps Demine. Washington: The Graduate 
School, Department of Agriculture. 1939. 


Theory of Probability. By HaAroup JEFFREYS, F.R.S. Oxford: At the Clarendon 
Press. 1939. Price 21s. 


The Probability Integral. By the late W. F. SHepparp,. F.R.S., being Vol. vm 
of the British Association’s series of Mathematical Tables. Completed and 
edited by the Committee for the Calculation of Mathematical Tables. 
Cambridge: At the University Press. 1939. Price 8s. 6d. 


The Races of Europe. By CARLETON STEVENS Cook. New York: The Macmillan 
Company. 1939. Price 31s. 6d. 


Tuberculosis and Social Conditions in England, with special reference to Young 
Adults. (A statistical study.) By P. D’Arcy Hart and G. Paytine 
Wricut. London: National Association for the Prevention of Tuberculosis. 
1939. Price 3s. 

Statistical Procedures and their Mathematical Bases. By CHARLES C. PETERS and 


WALTER R. vAN Vooruts. New York and London: McGraw-Hill Book 
Company, Inc. 1940. Price $4.50. 




















(All Rights reserved) 
BIOMETRIKA. Vol. XXXII, Part I 
CONTENTS 


A theory of randomness. By M. G. Kmnpat 


A geometrical analysis of the frequency distribution of the ratio between two variables. 
C. Nicnotson. With four Figures in the Text 


The statistical significance of canonical correlations. By M. 8. Bartuetr 
On the limiting distribution of the canonical correlations: By P. L. Hsu 


The application of maximum likelihood to dosage-mortality curves. By F. Garwoop 
A note on further properties of statistical tests. By E.S. Pearson . . ° 59—61 


Analysis of variance from the power function standpoint. By P.L. Hsu. . 62—69 
On an optimum property of two important statistical tests. By J.B. Smmarma . 70—80 


MISCELLANEA : 
(i) A recurrence relation for the semi-invariants of Pearson curves. By M.G. Kenpatt. - 81—82 


(ii) A comparison of annual and biennial inflorescences of Daucus carota (wild carrot). By 
W. D. Baten ° ‘ ° . ° ° ° ° . ° : . ° . - 82—87 


Books received . i A ? é ‘ ‘ uM a ‘ - ‘ - : . 88 


A volume of Biometrika containing about 400 pages, with plates and tables. is normally issued annually. Owing to war 
conditions, however, delay in issue is inevitable. 

Papers for publication should either be sent to 

PROFESSOR E. 8. PEARSON, Department of Statistics, University College, London, W.C.1, 
or if more convenient may be submitted through a member of the Editorial Committee, viz. 
Proressor Haratp Cramér, University of Stockholm, Sweden. 
Dr R. C. Guary, Statistics Branch, Department of Industry and Commerce, Dublin. 
Proressor M, GREENWOOD, F.R.S., London School of Hygiene and Tropical Medicine, London, W.C.1. 
Prorsssor J. B. 8. Hatpanz, F.R.S., University College, London, W.C.1. 
Dr G. M. Morant, University College, London, W.C. 1. 
Dr Jonn WisxHart, School of Agriculture, Cambridge. 

It is a condition of publication in Biometrika that the paper shall not already have been issued elsewhere, and will not be 
reprinted without leave of the Editors. 

Contributors receive 25 copies of their papers free. Joint authors 15 copies each. The price of additional copies which 
should be ordered when the author’s proof is returned will depend upon the number of pages, plates, etc. 

The subscription price, payable in advance, is Inland 45s. net per volume and Abroad 54s. net (including packing and 
postage). Owing to the scarcity of early volumes, the following rates must now be charged for complete sets. Vols. I—X XXI, 
including XX*: £128 in buckram, £117 in wrappers, not including postage. Recent volumes may still be obtained at 
the wrapper price; this is 64s. inland, including postage. Standard buckram cases with Darwin block, price 3s. 6d. + the 
postage per volume. Index to Vols. I to V, 2s. net. Index to Vols. I to XV, 5s. net. Cheques must be made payable 
to Biometrika and sent to The Secretary, Biometrika Office, Department of Statistics, University College, London, W.C.1, 
to whom all orders for series, single copies and offprints should be addressed. All cheques must be properly stamped and 
should be crossed “a/c Biometrika Trust”. No foreign cheques can be accepted unless they are drawn in sterling and 
payable at a London agency. 








